<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Series-Tinkerbell on ln --help</title>
    <link>https://blog.mei-home.net/tags/series-tinkerbell/</link>
    <description>Recent content in Series-Tinkerbell on ln --help</description>
    <generator>Hugo -- 0.147.2</generator>
    <language>en</language>
    <lastBuildDate>Tue, 15 Jul 2025 22:50:11 +0200</lastBuildDate>
    <atom:link href="https://blog.mei-home.net/tags/series-tinkerbell/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Tinkerbell Part V: Booting HookOS on a Pi 4</title>
      <link>https://blog.mei-home.net/posts/tinkerbell-5-hookos-direct-boot/</link>
      <pubDate>Tue, 15 Jul 2025 22:50:11 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/tinkerbell-5-hookos-direct-boot/</guid>
      <description>I&amp;#39;m trying to boot Tinkerbell&amp;#39;s HookOS on a Pi 4 without iPXE/EFI</description>
      <content:encoded><![CDATA[<p>In this post, I will describe my failed attempts of booting Tinkerbell&rsquo;s in-memory
HookOS directly on a Pi 4, without iPXE or UEFI.</p>
<p>This is part 5 of my <a href="https://blog.mei-home.net/tags/series-tinkerbell/">Tinkerbell series</a>.</p>
<p>In my <a href="https://blog.mei-home.net/posts/tinkerbell-4-provisioning-pi4/">previous post</a>, I
described how I provisioned a Pi 4 using Tinkerbell&rsquo;s standard way via UEFI
and iPXE. This was a complicated and convoluted process, requiring heavy use of
Dnsmasq on the side and bouncing between requests to said Dnsmasq and Tinkerbell
itself. In the end, I was only able to do it after completely switching off
Tinkerbell&rsquo;s DHCP functionality. I wasn&rsquo;t particularly fond of that option,
because I quite liked how it worked for provisioning the VM in my first
experiments. I didn&rsquo;t want to completely switch off DHCP in Tinkerbell just
because of the Pi 4.</p>
<p>Another pretty big issue was the Pi 5. From everything I could see, the
<a href="https://github.com/worproject/rpi5-uefi">Pi 5 UEFI project</a> is dead right now.
So working with iPXE/UEFI was not possible for the Pi 5 anyway, and I&rsquo;m already
running three of those and I&rsquo;m planning to add a fourth.</p>
<h2 id="the-potential-solution-with-direct-boot">The potential solution with direct boot</h2>
<p>So I took a look at what Tinkerbell&rsquo;s provisioning actually does. It&rsquo;s core part
is the Tink workflow engine, running on <a href="https://github.com/tinkerbell/hook">HookOS</a>.
This is Tinkerbell&rsquo;s in-memory provisioning OS. The only task it has is to provide
a Linux environment with Docker to run the provisioning tasks. And it&rsquo;s not a
special Linux really, just one which runs entirely from the initramfs.</p>
<p>So the only thing I really needed was the ability to boot into the HookOS kernel,
which again, isn&rsquo;t actually anything special, and then run HookOS&rsquo; initramfs.
And that&rsquo;s already possible with the Pi&rsquo;s netboot mechanism. You can provide
the name of a kernel and an initramfs, and the Pi&rsquo;s firmware will download those
from the TFTP server it receives during DHCP discovery. This is at least a simpler
approach than needing to work with UEFI and iPXE. And it has the advantage that
it should also work with the Pi 5.</p>
<p>There are a couple of additional issues with this solution, mainly that I would
still like a tighter connection with Tinkerbell&rsquo;s DHCP side. But for now, I&rsquo;m
mostly interested in seeing what my overall options with Tinkerbell and the
Raspberry Pi&rsquo;s netboot process are. Then I will think a bit more about potential
changes I could propose to the Tinkerbell project.</p>
<h2 id="trying-the-official-hookos-release">Trying the official HookOS release</h2>
<p>The newest HookOS release is <a href="https://github.com/tinkerbell/hook/releases/tag/v0.10.0">v0.10.0</a>,
so I started with that one. Besides the standard x86_64 and aarch64 kernels,
HookOS also provides a version with <a href="https://www.armbian.com/">Armbian&rsquo;s</a>
Raspberry Pi kernel and an initramfs build for aarch64. I started with that one.</p>
<p>In preparation, I downloaded the <code>hook_armbian-bcm2711-current.tar.gz</code> file from
HookOS, which contains the kernel and initramfs. But this leaves some Pi specific
files out, which are also needed when netbooting a Pi. I decided to get those
files from Armbian as well, namely <a href="https://www.armbian.com/rpi4b/">this page</a>.
I choose the &ldquo;Minimal/IoT&rdquo; image. Then I mounted the image locally to get at the
content of the boot partition:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>losetup -f --show -P Armbian_25.5.1_Rpi4b_noble_current_6.12.28_minimal.img
</span></span><span style="display:flex;"><span>mount /dev/loop0p1 /mnt/temp/
</span></span></code></pre></div><p>This then allowed me to copy a couple of files, namely:</p>
<ul>
<li><code>bcm2711-rpi-4-b.dtb</code> (that was the only dtb I copied, because I&rsquo;m working only with a Pi 4b for now)</li>
<li><code>cmdline.txt</code></li>
<li><code>config.txt</code></li>
<li><code>fixup4*</code></li>
<li><code>start4*</code></li>
</ul>
<p>The next challenge was the kernel command line. Tinkerbell provides a few
important values through the kernel command line when booting via its iPXE
script, so I booted the Pi with iPXE again and wanted to copy the kernel command
line. But I did not have direct access to the Pi from my desktop, because HookOS
doesn&rsquo;t run SSH by default. I was using it through a separate keyboard and display.</p>
<p>At that point I had to sit back for a few minutes and consider my life choices
a bit. Because with all the services I&rsquo;ve got running in my Homelab, all the
Kubernetes clusters, the Ceph storage clusters, the myriad of apps - I somehow
did not have a no-frills, zero config way to share a copy+paste of the kernel
command line from one host to another. I was a bit disappointed in myself.</p>
<p>But then, I had an excellent idea, if I may say so myself: <a href="https://en.wikipedia.org/wiki/Netcat">Netcat!</a>.
It can do simple TCP transfer. So I launched this command on my desktop:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>nc -l -p <span style="color:#ae81ff">1234</span> &gt; out.txt
</span></span></code></pre></div><p>And then, on the Pi booted into HookOS, I ran this:</p>
<pre tabindex="0"><code>dmesg &gt; out.txt
nc -w 3 198.51.100.25 1234 &lt; out.txt
</code></pre><p>And just like that, I had the data available on my desktop. I&rsquo;m honestly a bit
enamored with myself for coming up with this rather simple and expedient solution. &#x1f601;</p>
<p>The important bits of the command line looked like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>tink_worker_image<span style="color:#f92672">=</span>ghcr.io/tinkerbell/tink-agent:v0.18.3-b817f7f2 facility<span style="color:#f92672">=</span> syslog_host<span style="color:#f92672">=</span>203.0.113.200 grpc_authority<span style="color:#f92672">=</span>203.0.113.200:42113 tinkerbell_tls<span style="color:#f92672">=</span>false tinkerbell_insecure_tls<span style="color:#f92672">=</span>false worker_id<span style="color:#f92672">=</span>e4:5f:01:bc:f4:ce hw_addr<span style="color:#f92672">=</span>e4:5f:01:bc:f4:ce modules<span style="color:#f92672">=</span>loop,squashfs,sd-mod,usb-storage initrd<span style="color:#f92672">=</span>initramfs-aarch64
</span></span></code></pre></div><p>I added those Tinkerbell-specific options to the <code>cmdline.txt</code> file in the
TFTP directory and also adapted the <code>config.txt</code>, setting the HookOS kernel and
initramfs:</p>
<pre tabindex="0"><code>[all]
kernel=vmlinuz-armbian-bcm2711-current
initramfs initramfs-armbian-bcm2711-current followkernel
</code></pre><p>With all of that done, I booted the Pi up. While the kernel booted and the
containerd in the initramfs was also started, there was no shell, and in the
Tinkerbell logs I did not see any attempt by the Pi to contact Tinkerbell. I did
not even see an attempt to get an IP via DHCP after the netboot part was done.
The only error message I could see on the small screen I was using was this one:</p>
<pre tabindex="0"><code>Failed to read service spec error &#34;open /containers/services/getty/config.json: No such file or directory&#34;
</code></pre><p>Considering that getty is what provides the shell, at least I now knew
why I wasn&rsquo;t getting a prompt. So I unpacked the initramfs with this command
to make sure the file was actually there:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>gunzip -c initramfs-armbian-bcm2711-current | cpio -i
</span></span></code></pre></div><p>And yes, the file does actually exist in the initramfs. So what&rsquo;s going on here?
My main problem was that I wasn&rsquo;t getting a shell, so I didn&rsquo;t have any good
way to get at the rest of the boot messages, to see whether there was another
error.
So I went for a bad way instead: Filming the small screen I had connected to the
Pi. As you might imagine, this wasn&rsquo;t a great solution. One, I had to transfer
the video to my desktop via Nextcloud, because I couldn&rsquo;t properly read
anything on my phone&rsquo;s screen. Then there&rsquo;s the problem that the video is taken
at a certain framerate, and sometimes the logs scrolled by too quickly to catch
everything.</p>
<p>This is what all too much of the video looked like:</p>
<figure>
    <img loading="lazy" src="unreadable-output.jpg"
         alt="A picture of a small screen showing Linux kernel startup logs. The output is all jumbled up, with previous lines still partially visible, faded under the current lines."/> <figcaption>
            <p>Not really readable output from trying to take a video of the boot process.</p>
        </figcaption>
</figure>

<p>But I still got a bit more out of it, most importantly this message:</p>
<pre tabindex="0"><code>rootfs image is not initramfs (read error): looks like an initrd
</code></pre><p>But that was less than helpful. At least on my desktop, the initramfs looked
perfectly fine, no issues packaging it up at all.</p>
<p>But while fudging with kernel command line options and the <code>config.txt</code> content,
to no avail at all, I suddenly saw the <code>console=</code> option. And realized that I
could make my life at least a bit easier. I got out my trusty USB-to-Serial
adapter and followed <a href="https://www.jeffgeerling.com/blog/2021/attaching-raspberry-pis-serial-console-uart-debugging">this tutorial</a>
to get it attached to the Pi. After adding <code>console=serial0,115200</code> to the
kernel command line, I was then able to connect to the Pi via serial console.
I used minicom on my desktop, where the serial adapter showed up as <code>/dev/ttyUSB0</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>minicom -b <span style="color:#ae81ff">115200</span> -D /dev/ttyUSB0
</span></span></code></pre></div><p>And just like that, I had all of the boot time messages on my desktop and no
longer needed to film the boot process.</p>
<p>But I still wasn&rsquo;t really getting anywhere, the errors stayed the same. I also
tried a few other kernels, thinking that there might be something wrong with the
HookOS kernel. I tried for example the Ubuntu kernel I use for my production
Pis, but to no avail. The error stayed the same.</p>
<p>So I decided I would dig into HookOS and <a href="https://github.com/linuxkit/linuxkit">LinuxKit</a>,
which HookOS is based on.</p>
<h2 id="trying-a-newer-kernel">Trying a newer kernel</h2>
<p>Still having no idea what&rsquo;s going on, I decided to try a newer kernel. The
last HookOS release was from November 2024, so I figured perhaps something
changed.</p>
<p>And at this point, I have to send a really big kudos to the Tinkerbell team
for HookOS&rsquo; builds. I was perfectly prepared to spend some time to get my VM set
up properly to actually build HookOS successfully. But I didn&rsquo;t need to. Quite
to the contrary. Most dependencies were automatically installed, and everything
went very smoothly. I was rather impressed. &#x1f44d;</p>
<p>So for the experiment, I cloned the <a href="https://github.com/tinkerbell/hook">HookOS repo</a>
locally and switched into it. Then I had to manually install a few tools which
were missing:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>apt install docker.io docker-buildx
</span></span></code></pre></div><p>Then I just executed the build script:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>./build.sh kernel armbian-bcm2711-current
</span></span></code></pre></div><p>This installed a few additional dependencies via apt, and then build an OCI
image with the newest Armbian Pi kernel. Then, to build the full HookOS, including
device trees and initramfs, I ran this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>./build.sh build armbian-bcm2711-current
</span></span></code></pre></div><p>At the time I executed the commands, the Armbian kernel I got was <code>6.12.35-S8292-Dbdda-P0000-Ce6dbH2313-HK01ba-Vc222-Ba566-R448a</code>.
The build results in an <code>out/</code> directory in the local dir, which contains the
device tree, Raspberry Pi overlays and kernel+initramfs. I copied it all into
my TFTP directory and tried to boot the Pi again. But yet again, I did get a
boot and containerd startup, but no prompt.</p>
<p>In a last desperate attempt, I tried with a <code>5.15</code> kernel from Armbian&rsquo;s
<code>kernel-bcm2711-legacy</code>, but that also ran into exactly the same issue.</p>
<h2 id="mangling-hookos">Mangling HookOS</h2>
<p>After a while of fruitlessly playing around, I started reading more and more
Google hits talking about truncation of initramfs by some implementations. So
I decided to try to reduce the size by re-compressing the initramfs with zstd.
That only reduced the size of the initramfs down to 122 MB, but it did something
more important: It confirmed the truncation theory via this kernel message:</p>
<pre tabindex="0"><code>rootfs image is not initramfs (ZSTD-compressed data is truncated); looks like an initrd
[...]
RAMDISK: zstd image found at block 0
RAMDISK: incomplete write (-28 != 131072)
</code></pre><p>This error indicates that the initramfs compression was correctly recognized,
but the data was truncated. I finally had proper proof that truncation was the
problem.</p>
<p>For further investigation, I adapted HookOS a bit to ensure that Getty gets
launched early in the boot process, in the hope that I would get a prompt and
could look around.</p>
<p>LinuxKit, the dockerized Linux distro HookOS is build upon, has a template file
which describes what to put into the initramfs. The one for HookOS looks like
this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">kernel</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">image</span>: <span style="color:#e6db74">&#34;${HOOK_KERNEL_IMAGE}&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">cmdline</span>: <span style="color:#e6db74">&#34;464vn90e7rbj08xbwdjejmdf4it17c5zfzjyfhthbh19eij201hjgit021bmpdb9ctrc87x2ymc8e7icu4ffi15x1hah9iyaiz38ckyap8hwx2vt5rm44ixv4hau8iw718q5yd019um5dt2xpqqa2rjtdypzr5v1gun8un110hhwp8cex7pqrh2ivh0ynpm4zkkwc8wcn367zyethzy7q8hzudyeyzx3cgmxqbkh825gcak7kxzjbgjajwizryv7ec1xm2h0hh7pz29qmvtgfjj1vphpgq1zcbiiehv52wrjy9yq473d9t1rvryy6929nk435hfx55du3ih05kn5tju3vijreru1p6knc988d4gfdz28eragvryq5x8aibe5trxd0t6t7jwxkde34v6pj1khmp50k6qqj3nzgcfzabtgqkmeqhdedbvwf3byfdma4nkv3rcxugaj2d0ru30pa2fqadjqrtjnv8bu52xzxv7irbhyvygygxu1nt5z4fh9w1vwbdcmagep26d298zknykf2e88kumt59ab7nq79d8amnhhvbexgh48e8qc61vq2e9qkihzt1twk1ijfgw70nwizai15iqyted2dt9gfmf2gg7amzufre79hwqkddc1cd935ywacnkrnak6r7xzcz7zbmq3kt04u2hg1iuupid8rt4nyrju51e6uejb2ruu36g9aibmz3hnmvazptu8x5tyxk820g2cdpxjdij766bt2n3djur7v623a2v44juyfgz80ekgfb9hkibpxh3zgknw8a34t4jifhf116x15cei9hwch0fye3xyq0acuym8uhitu5evc4rag3ui0fny3qg4kju7zkfyy8hwh537urd5uixkzwu5bdvafz4jmv7imypj543xg5em8jk8cgk7c4504xdd5e4e71ihaumt6u5u2t1w7um92fepzae8p0vq93wdrd1756npu1pziiur1payc7kmdwyxg3hj5n4phxbc29x0tcddamjrwt260b0w&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">init</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#75715e"># this init container sha has support for volumes</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">linuxkit/init:872d2e1be745f1acb948762562cf31c367303a3b</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;${HOOK_CONTAINER_RUNC_IMAGE}&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;${HOOK_CONTAINER_CONTAINERD_IMAGE}&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">linuxkit/ca-certificates:v1.0.0</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">linuxkit/firmware:24402a25359c7bc290f7fc3cd23b6b5f0feb32a5</span> <span style="color:#75715e"># &#34;Some&#34; firmware from Linuxkit pkg; see https://github.com/linuxkit/linuxkit/blob/master/pkg/firmware/Dockerfile</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;${HOOK_CONTAINER_EMBEDDED_IMAGE}&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">onboot</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">dhcpcd-once</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">linuxkit/dhcpcd:v1.0.0</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: [ <span style="color:#e6db74">&#34;/etc/ip/dhcp.sh&#34;</span>, <span style="color:#e6db74">&#34;true&#34;</span> <span style="color:#f92672">] # 2nd paramter is one-shot true/false</span>: <span style="color:#66d9ef">true</span> <span style="color:#ae81ff">for onboot, false for services</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e">#capabilities.add:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e">#  - CAP_SYS_TIME # for ntp one-shot no-max-offset after ntpd, for hardware missing RTC&#39;s that boot in 1970</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">binds.add</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/var/lib/dhcpcd:/var/lib/dhcpcd</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/run:/run</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/etc/ip/dhcp.sh:/etc/ip/dhcp.sh</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/dhcpcd.conf:/dhcpcd.conf</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">runtime</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mkdir</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">/var/lib/dhcpcd</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">udev</span> <span style="color:#75715e"># as a service; so system reacts to changes in devices</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#e6db74">&#34;${HOOK_CONTAINER_UDEV_IMAGE}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: [ <span style="color:#e6db74">&#34;/lib/systemd/systemd-udevd&#34;</span>, <span style="color:#e6db74">&#34;--debug&#34;</span> ]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">capabilities</span>: [ <span style="color:#ae81ff">all ]</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">binds</span>: [ <span style="color:#ae81ff">/dev:/dev, /sys:/sys, /lib/modules:/lib/modules ]</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">rootfsPropagation</span>: <span style="color:#ae81ff">shared</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">net</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">pid</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">devices</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">b</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">type</span>: <span style="color:#ae81ff">c</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">getty</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">linuxkit/getty:v1.0.0</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">binds.add</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/etc/profile.d/local.sh:/etc/profile.d/local.sh</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/etc/securetty:/etc/securetty</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/etc/motd:/etc/motd</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/etc/os-release:/etc/os-release</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/:/host_root</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/run:/run</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/dev:/dev</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/dev/console:/dev/console</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/usr/bin/nerdctl:/usr/bin/nerdctl</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">INSECURE=true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">devices</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">b</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hook-docker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#e6db74">&#34;${HOOK_CONTAINER_DOCKER_IMAGE}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">net</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">pid</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mounts</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">cgroup2</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">options</span>: [ <span style="color:#e6db74">&#34;rw&#34;</span>, <span style="color:#e6db74">&#34;nosuid&#34;</span>, <span style="color:#e6db74">&#34;noexec&#34;</span>, <span style="color:#e6db74">&#34;nodev&#34;</span>, <span style="color:#e6db74">&#34;relatime&#34;</span> ]
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">destination</span>: <span style="color:#ae81ff">/sys/fs/cgroup</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">binds.add</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/dev/console:/dev/console</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/dev:/dev</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/etc/resolv.conf:/etc/resolv.conf</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/lib/modules:/lib/modules</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/var/run/docker:/var/run</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/var/run/images:/var/lib/docker</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/var/run/worker:/worker</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/:/host_root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">runtime</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mkdir</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">/var/run/images</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">/var/run/docker</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">/var/run/worker</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">devices</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">b</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">type</span>: <span style="color:#ae81ff">c</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">hook-bootkit</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#e6db74">&#34;${HOOK_CONTAINER_BOOTKIT_IMAGE}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">net</span>: <span style="color:#ae81ff">host</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mounts</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#f92672">type</span>: <span style="color:#ae81ff">cgroup2</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">options</span>: [ <span style="color:#e6db74">&#34;rw&#34;</span>, <span style="color:#e6db74">&#34;nosuid&#34;</span>, <span style="color:#e6db74">&#34;noexec&#34;</span>, <span style="color:#e6db74">&#34;nodev&#34;</span>, <span style="color:#e6db74">&#34;relatime&#34;</span> ]
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">destination</span>: <span style="color:#ae81ff">/sys/fs/cgroup</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">binds</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/var/run/docker:/var/run</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">runtime</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mkdir</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">/var/run/docker</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">dhcpcd-daemon</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">image</span>: <span style="color:#ae81ff">linuxkit/dhcpcd:v1.0.0</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">command</span>: [ <span style="color:#e6db74">&#34;/etc/ip/dhcp.sh&#34;</span>, <span style="color:#e6db74">&#34;false&#34;</span> <span style="color:#f92672">] # 2nd paramter is one-shot true/false</span>: <span style="color:#66d9ef">true</span> <span style="color:#ae81ff">for onboot, false for services</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e">#capabilities.add:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e">#  - CAP_SYS_TIME # for ntp one-shot no-max-offset after ntpd, for hardware missing RTC&#39;s that boot in 1970</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">all</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">binds.add</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/var/lib/dhcpcd:/var/lib/dhcpcd</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/run:/run</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/etc/ip/dhcp.sh:/etc/ip/dhcp.sh</span>
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">/dhcpcd.conf:/dhcpcd.conf</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">runtime</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mkdir</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">/var/lib/dhcpcd</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">files</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">etc/os-release</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#34;0444&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">contents</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      NAME=&#34;HookOS&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      VERSION=${HOOK_VERSION}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ID=hookos
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      VERSION_ID=${HOOK_VERSION}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      PRETTY_NAME=&#34;HookOS ${HOOK_KERNEL_ID} v${HOOK_VERSION}/k${HOOK_KERNEL_VERSION}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ANSI_COLOR=&#34;1;34&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      HOME_URL=&#34;https://github.com/tinkerbell/hook&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">path</span>: <span style="color:#ae81ff">etc/securetty</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">contents</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      console
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ttyUSB0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ttyUSB1
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      ttyUSB2</span>
</span></span></code></pre></div><p>The above is only supposed to serve as an example, so I removed a lot of lines
and comments. If you&rsquo;d like to have a look at the full file, have a look at the
<a href="https://github.com/tinkerbell/hook/blob/main/linuxkit-templates/hook.template.yaml">GitHub repo</a>.</p>
<p>My idea was to see whether I could get Getty to be put into the root of the
initramfs, instead of having it launched as a container. Looking at the Yaml file,
I decided I would just try to move it from the <code>services:</code> list to the <code>init:</code>
list, like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">init</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">linuxkit/getty:v1.0.0</span>
</span></span></code></pre></div><p>And that actually worked! The other issues were still there - the image was
still truncated, but now Getty was coming early enough in the image to be in
the non-truncated part. I was now getting a prompt when booting into the initramfs.</p>
<p>Looking around, I still couldn&rsquo;t fine any other obvious errors, just more
boot services which failed to start because their <code>config.json</code> files became
victims of the truncation. But I at least had another piece of proof that
truncation was happening, as I checked the total size of the unpacked initramfs
on my VM, and it was 603 MB. Checking the <code>/</code> size in the booted initramfs only
showed 404 MB total. Weirdly, part of that 404 MB was a 90 MB <code>initrd.img</code> file
in <code>/</code> which I couldn&rsquo;t make heads or tails of. The file definitely wasn&rsquo;t
from the actual initramfs, and I wasn&rsquo;t able to figure out where it came from
or what was in it from Google either.</p>
<p>Anyone got any idea what that <code>initrd.img</code> file suddenly appearing in my initramfs
might be?</p>
<p>At this point it was pretty clear that I&rsquo;m having a truncation problem. But googling
a bit, the next question was: Where?</p>
<h2 id="figuring-out-whos-truncating">Figuring out who&rsquo;s truncating</h2>
<p>Initial searches pointed me towards TFTP as the culprit. The <a href="https://en.wikipedia.org/wiki/Trivial_File_Transfer_Protocol">Wikipedia article</a>
has this to say:</p>
<blockquote>
<p>The original protocol has a transfer file size limit of 512 bytes/block x 65535 blocks = 32 MB. In 1998 this limit was extended to 65535 bytes/block x 65535 blocks = 4 GB by TFTP Blocksize Option RFC 2348. [&hellip;] If TFTP packets should be kept within the standard Ethernet MTU (1500), the blocksize value is calculated as 1500 minus headers of TFTP (4 bytes), UDP (8 bytes) and IP (20 bytes) = 1468 bytes/block, this gives a limit of 1468 bytes/block x 65535 blocks = 92 MB. Today most servers and clients support block number roll-over (block counter going back to 0 or 1[10] after 65535) which gives an essentially unlimited transfer file size.</p></blockquote>
<p>So it looked like, unless block number roll-over was implemented in the Pi
firmware, the maximum file size would be 92 MB. To try to verify that, I took
a tcpdump from the transfer of a 155 MB initramfs.
Here is the option acknowledgment packet:</p>
<p><figure>
    <img loading="lazy" src="read-req-ack.png"
         alt="A screenshot of a Wireshark packet output. It shows the TFTP content of the packet. The destination file is named as &#39;initramfs-armbian-bcm2711-legacy&#39;, the blksize option is 1468 and the tsize is 155222922 bytes."/> <figcaption>
            <p>Acknowledged options for the initramsfs transfer</p>
        </figcaption>
</figure>

So the blocksize is getting negotiated properly to the maximum size in my 1500 byte
MTU network, and the total size of 155 MB is also set correctly, it seems.</p>
<p>And here is the end of the transmission:
<figure>
    <img loading="lazy" src="read-finished.png"
         alt="A screenshot of a Wireshark packet output. It shows the end of the transfer, with a total of 105738 TFTP fragments and 155222922 bytes of data, exactly the same number as the tsize option from the start of the transmission. It also shows the actual block number as 40202."/> <figcaption>
            <p>Final data packet of the TFTP transfer for the initramfs</p>
        </figcaption>
</figure>

This output shows two things: First, exactly as many bytes were transferred as the
<code>tsize</code> option in the option acknowledgment at the start shows. In addition, a lot
more blocks (105738) were transferred than the max block number of 65535. The
actual block number of the last block was 40202, which indicates that the
previously mentioned block number roll-over was working as intended.</p>
<p>Overall, it did look like the entire file got transferred correctly.</p>
<p>So the next possibility was that there&rsquo;s something going wrong after the transfer.
For that, I had to have a look at the Pi&rsquo;s early boot process. First, I enabled
the <code>BOOT_UART=1</code> option. This option is in the Pi&rsquo;s firmware config stored in
EEPROM, so it needs to be set via the <code>rpi-eeprom-config</code> script from a running
Linux, it cannot be set via the <code>config.txt</code> file. Once I had that, I got the
first disappointment, as the output just stopped past this point:</p>
<pre tabindex="0"><code>TFTP_GET: aa:ce:d5:6e:90:cd 203.0.113.18 start4.elf

RX: 12 IP: 0 IPV4: 10 MAC: 10 UDP: 10 UDP RECV: 10 IP_CSUM_ERR: 0 UDP_CSUM_ERR: 0
TFTP: complete 2256224
RX: 14 IP: 0 IPV4: 12 MAC: 12 UDP: 12 UDP RECV: 12 IP_CSUM_ERR: 0 UDP_CSUM_ERR: 0
Read start4.elf bytes  2256224 hnd 0x0
[...]
Starting start4.elf @ 0xfec00200 partition -1
</code></pre><p>It only started up again when the kernel started booting. To get output from
the <code>start4.elf</code> execution, I had to add another option, <a href="https://www.raspberrypi.com/documentation/computers/config_txt.html#uart_2ndstage">uart_2ndstage</a>. This option can luckily be set in the <code>config.txt</code> file, so no further trip
into Linux was necessary.</p>
<p>That then finally delivered the answer to the question of where the
truncation happens with this message:</p>
<pre tabindex="0"><code>MESS:00:00:55.768976:0: initramfs loaded to 0x29440000 (size 0x5bbfa44)
</code></pre><p>The size given here is approximately 96 MB. So even though the file was larger,
and it looked like it was transferred in its entirety, the Pi&rsquo;s firmware only
loaded 96 MB into memory for the kernel to use. And that&rsquo;s where the truncation
was coming from.</p>
<h2 id="new-plan">New plan</h2>
<p>So I needed a new plan. I think the most reasonable next approach would be to
turn the boot process into a two-stage setup. The first stage is a small initramfs,
only containing the tools to download the second stage, which will be the full
initramfs, and then to pivot into that new image.</p>
<p>One problem is that I don&rsquo;t want to hardcode the address/name of the initramfs
image into the first stage initramfs. One possible option would be to add an
option to the kernel command line, as the kernel forwards all options it doesn&rsquo;t
know to the <code>init</code> binary it executes after startup.</p>
<p>This approach has the advantage that the overall HookOS process doesn&rsquo;t need to
be changed. The original initramfs can be left entirely untouched and never needs
to know that there was another boot stage for specific boards.</p>
<p>Testing that will be my next task. I wanted to get out this post first because
I felt that, with the description of the investigations I did and the explanation
of the solution to the problem, the blog post would end up in another one of my
tomes. And the &ldquo;posts, not tomes&rdquo; project is still in effect. &#x1f601;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Tinkerbell Part IV: Provisioning a Raspberry Pi 4</title>
      <link>https://blog.mei-home.net/posts/tinkerbell-4-provisioning-pi4/</link>
      <pubDate>Sun, 29 Jun 2025 17:20:54 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/tinkerbell-4-provisioning-pi4/</guid>
      <description>I configure Tinkerbell and Dnsmasq to provisioning a Pi 4 with an USB SSD</description>
      <content:encoded><![CDATA[<p>In this post, I will show how I provisioned a Raspberry Pi 4 with an attached
USB SSD via Tinkerbell.</p>
<p>This is part 4 of my <a href="https://blog.mei-home.net/tags/series-tinkerbell/">Tinkerbell series</a>.</p>
<p>The main goal of this post is to get this little guy to boot into Tinkerbell&rsquo;s
<a href="https://tinkerbell.org/docs/additionalcomponents/hookos/">HookOS</a> and install
an Ubuntu 24.04 Raspberry Pi image onto the SSD:</p>
<figure>
    <img loading="lazy" src="the-pi.jpg"
         alt="A picture of a desk with a Raspberry Pi 4 board and accessories. The Pi 4 is clad in a passive red heat sink and mounted on a right-angle piece of metal. It&#39;s connected to a small 7 inch screen with an HDMI and an USB cable. Furthermore, it&#39;s also connected to a keyboard and has a network cable plugged in. Finally, it&#39;s also connected to a 2.5 inch Kingston SATA SSD via a USB-to-SATA adapter."/> <figcaption>
            <p>My experimental setup.</p>
        </figcaption>
</figure>

<p>To get the Ubuntu image onto the SSD and have the Pi boot from it, the following
steps need to be executed:</p>
<ol>
<li>Boot the Pi into an UEFI firmware via the Pi&rsquo;s weird PXE boot procedure</li>
<li>From the UEFI firmware, boot into iPXE, again via PXE boot</li>
<li>Fetch the iPXE script to execute HookOS from Tinkerbell. Again, you guessed
it, via PXE</li>
<li>Finally, boot HookOS itself</li>
</ol>
<h2 id="raspberry-pi-pxe-boot-to-uefi">Raspberry Pi PXE boot to UEFI</h2>
<p>To understand the rest of this post, let&rsquo;s start with a quick look at the
Raspberry Pi&rsquo;s netboot process. It all starts with a DHCP request. The direct
reply to that request might already contain a TFTP server address. If it doesn&rsquo;t,
the Pi&rsquo;s firmware will also wait for a Proxy DHCP reply. With this configuration,
it&rsquo;s possible to split the normal DHCP server doing IP address management and the
DHCP server which supplies PXE boot parameters.</p>
<p>When a TFTP server address is indeed received, the Pi starts to download files
from it. The boot file option that can also be supplied for PXE is not supported
by the Pi netboot process. It doesn&rsquo;t matter what that option is set to, and
whether it&rsquo;s send in the DHCP reply or not. It&rsquo;s just ignored. The initial file
being downloaded is the <code>config.txt</code> file. It contains configuration for the
firmware. Relevant to the boot process are the options for the <a href="https://www.raspberrypi.com/documentation/computers/config_txt.html#kernel">kernel</a> and
<a href="https://www.raspberrypi.com/documentation/computers/config_txt.html#initramfs">initramfs</a>
as well as, in this particular case, the <a href="https://www.raspberrypi.com/documentation/computers/legacy_config_txt.html#armstub">armstub</a>
option. The <code>armstub</code> tells the boot firmware - which runs on the GPU, on this
SoC - what to load up on the ARM CPU cores after the initial boot. By default,
that&rsquo;s just looking to load the <code>kernel</code> and <code>initramfs</code> during a normal boot,
leading to Linux being started. But when the <code>armstub</code> is set, the given file
is loaded instead. In all three cases, the files given are loaded from the
TFTP server when netbooting.</p>
<p>To load the Pi in UEFI mode, I&rsquo;ve been using <a href="https://github.com/rgl/rpi4-uefi-ipxe">this repository</a>.
Initially, I thought I needed this &ldquo;special&rdquo; firmware to get iPXE running, but
it turns out that the <a href="https://ipxe.org/">iPXE project</a> already provides the
<code>snp.efi</code> file, which is compatible with a Pi 4 booted into UEFI.</p>
<p>So for now, the goal is to get the Pi booted into the UEFI stub.</p>
<h2 id="dnsmasq-server-setup">Dnsmasq server setup</h2>
<p>To supply all of these files via TFTP, I needed a TFTP server. While
Tinkerbell does provide TFTP capabilities, those are very rudimentary and only
intended to provide the iPXE binary for PXE booting hosts, and nothing more.</p>
<p>As I&rsquo;ve already got a Dnsmasq instance running in my Homelab, for my regular
netbooters, I decided to use it here as well. And that was quite a ride in
and of itself, because of the way Kubernetes networking and DHCP work.</p>
<p>I set Dnsmasq up on my k3s test cluster running on a VM. I could not make the
Pod use host networking, because Tinkerbell, which also needed to listen on
port 67 for DHCP, was already running on the same host. So I decided to use
the same trick that Tinkerbell uses, a <code>macvlan</code> type interface. This type of
Linux interface is attached to a real physical interface, but gets a different
MAC, so it&rsquo;s basically a completely separate interface. The rest of the network
just knows that there&rsquo;s now two MAC addresses behind the given switch port instead
of just one. Tinkerbell has the same approach, see <a href="https://github.com/tinkerbell/tinkerbell/blob/main/helm/tinkerbell/templates/host-interface-config-map.yaml">here</a>.</p>
<p>This script creates an additional interface, which piggy-backs off of the physical
interface to make it possible for a Pod to receive and send broadcast packets.
With just the VIP created by <a href="https://kube-vip.io/">kube-vip</a> for LoadBalancer
services, broadcast packets are just not forwarded to the Pod, and Dnsmasq never
sees them. This is problematic, as the initial DHCP discover packets are send
as broadcast, as the host hasn&rsquo;t been configured yet and doesn&rsquo;t know about the
DHCP server in the subnet.</p>
<p>After configuring the macvlan interface, I tried this Dnsmasq configuration:</p>
<pre tabindex="0"><code>port=0
dhcp-range=203.0.113.255,proxy
log-dhcp
enable-tftp
tftp-root=/tftp-files
pxe-service=0,&#34;Raspberry Pi Boot&#34;,203.0.113.17
</code></pre><p>Together with the manually created macvlan interface, Dnsmasq was able to
receive the broadcast packets - but it wasn&rsquo;t able to answer them. Instead, I
got this line in the logs:</p>
<pre tabindex="0"><code>dnsmasq-dhcp[18135]: no address range available for DHCP request via macvlandnsm
</code></pre><p>After some digging, I figured out that the issue was that Dnsmasq uses the subnet
of the interface where a DHCP request arrives to determine which <code>dhcp-range</code>
parameter to use for the answer. And in this case, the <code>macvlandnsm</code> interface
gets the hardcoded <code>127.1.1.1</code> IP in the script. So I changed the <code>dhcp-range</code>
parameter like this:</p>
<pre tabindex="0"><code>dhcp-range=127.1.1.255,proxy
</code></pre><p>And this &ldquo;worked&rdquo;:</p>
<pre tabindex="0"><code>dnsmasq[2837]: started, version 2.91 DNS disabled
dnsmasq[2837]: compile time options: IPv6 GNU-getopt no-DBus no-UBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset no-nftset auth no-DNSSEC loop-detect inotify dumpfile
dnsmasq-dhcp[2837]: DHCP, proxy on subnet 127.1.1.255
dnsmasq-tftp[2837]: TFTP root is /tftp-files
dnsmasq-dhcp[2837]: 2783272004 available DHCP subnet: 127.1.1.255/255.0.0.0
dnsmasq-dhcp[2837]: 2783272004 vendor class: PXEClient:Arch:00000:UNDI:002001
dnsmasq-dhcp[2837]: 2783272004 PXE(macvlandnsm) e4:5f:01:bc:f4:ce proxy
dnsmasq-dhcp[2837]: 2783272004 tags: macvlandnsm
dnsmasq-dhcp[2837]: 2783272004 broadcast response
dnsmasq-dhcp[2837]: 2783272004 sent size:  1 option: 53 message-type  2
dnsmasq-dhcp[2837]: 2783272004 sent size:  4 option: 54 server-identifier  127.1.1.1
dnsmasq-dhcp[2837]: 2783272004 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
dnsmasq-dhcp[2837]: 2783272004 sent size: 17 option: 97 client-machine-id  00:34:69:50:52:15:31:c0:00:01:bc:f4:ce:cb...
dnsmasq-dhcp[2837]: 2783272004 sent size: 41 option: 43 vendor-encap  06:01:03:0a:04:00:50:58:45:08:07:80:00:01...
dnsmasq-dhcp[2837]: 2783272004 available DHCP subnet: 127.1.1.255/255.0.0.0
dnsmasq-dhcp[2837]: 2783272004 vendor class: PXEClient:Arch:00000:UNDI:002001
</code></pre><p>So Dnsmasq did receive the DHCP request, and it also answered to it. But
have a closer look at this line:</p>
<pre tabindex="0"><code>dnsmasq-dhcp[2837]: 2783272004 sent size:  4 option: 54 server-identifier  127.1.1.1
</code></pre><p>Note the <code>127.1.1.1</code> IP returned by Dnsmasq to the netbooting Pi. That&rsquo;s what
the Pi uses as the TFTP server. And of course, that address is from the loopback
range, and hence isn&rsquo;t accessible for the Pi at all.</p>
<p>After some additional tinkering and testing, I came up with the solution to just
assign the <code>macvlandnsm</code> interface a routable IP, and also assigned the IP as
<code>/24</code> instead of <code>/32</code>. Then I reset the <code>dhcp-range</code> option to contain the
actual subnet:</p>
<pre tabindex="0"><code>dhcp-range=203.0.113.255,proxy
</code></pre><p>With these changes, the Pi was then able to boot into UEFI:</p>
<pre tabindex="0"><code>dnsmasq-dhcp[20493]: 2783272890 available DHCP subnet: 203.0.113.255/255.255.255.0
dnsmasq-dhcp[20493]: 2783272890 vendor class: PXEClient:Arch:00000:UNDI:002001
dnsmasq-dhcp[20493]: 2783272890 PXE(macvlandnsm) e4:5f:01:bc:f4:ce proxy
dnsmasq-dhcp[20493]: 2783272890 tags: macvlandnsm
dnsmasq-dhcp[20493]: 2783272890 broadcast response
dnsmasq-dhcp[20493]: 2783272890 sent size:  1 option: 53 message-type  2
dnsmasq-dhcp[20493]: 2783272890 sent size:  4 option: 54 server-identifier  203.0.113.18
dnsmasq-dhcp[20493]: 2783272890 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
dnsmasq-dhcp[20493]: 2783272890 sent size: 17 option: 97 client-machine-id  00:34:69:50:52:15:31:c0:00:01:bc:f4:ce:cb...
dnsmasq-dhcp[20493]: 2783272890 sent size: 41 option: 43 vendor-encap  06:01:03:0a:04:00:50:58:45:08:07:80:00:01...
dnsmasq-dhcp[20493]: 2783272890 available DHCP subnet: 203.0.113.255/255.255.255.0
dnsmasq-dhcp[20493]: 2783272890 vendor class: PXEClient:Arch:00000:UNDI:002001
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/start4.elf to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/fixup4.dat to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/bcm2711-rpi-4-b.dtb to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/overlays/miniuart-bt.dtbo to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/overlays/upstream-pi4.dtbo to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/RPI_EFI.fd to 203.0.113.70
</code></pre><p>I have removed a number of lines from the log output where the Pi aborted the
transmission. This approach is used to check whether a certain file is present
on the TFTP server to decide what to download next.</p>
<p>To provide the files in the <code>/tftp-files</code> directory in the Dnsmasq Pod, I used
<a href="https://github.com/rgl/rpi4-uefi-ipxe/releases/tag/v0.11.0">this release</a>. I
took the <code>rpi4-uefi-ipxe.zip</code> file and unpacked it all in the <code>/tftp-files</code>
dir, to which I had mounted a PersistentVolume.</p>
<p>I&rsquo;ve also simplified Tinkerbell&rsquo;s manual interface setup script a bit to use
it with Dnsmasq. It now looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#75715e">#!/usr/bin/env sh
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Script taken from Tinkerbell: https://raw.githubusercontent.com/tinkerbell/tinkerbell/refs/heads/main/helm/tinkerbell/templates/host-interface-config-map.yaml</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># This script allows us to listen and respond to DHCP requests on a host network interface and interact with Dnsmasq.</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>set -xeuo pipefail
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> usage<span style="color:#f92672">()</span> <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;Usage: </span>$0<span style="color:#e6db74"> [OPTION]...&#34;</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;Init script for setting up a network interface to listen and respond to DHCP requests from the Host and move it into a container.&#34;</span>
</span></span><span style="display:flex;"><span>    echo
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;Options:&#34;</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;  -s, --src     Source interface for listening and responding to DHCP requests (default: default gateway interface)&#34;</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;  -t, --type    Create the interface of type, must be either ipvlan or macvlan (default: macvlan)&#34;</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;  -c, --clean   Clean up any interfaces created&#34;</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;  -h, --help    Display this help and exit&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> binary_exists<span style="color:#f92672">()</span> <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    command -v <span style="color:#e6db74">&#34;</span>$1<span style="color:#e6db74">&#34;</span> &gt;/dev/null 2&gt;&amp;<span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> main<span style="color:#f92672">()</span> <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>    local src_interface<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span>$1<span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>    local interface_type<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span>$2<span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>    local interface_mode<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span>$3<span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>    local interface_name<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;macvlandnsm&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Preparation</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Delete existing interfaces in the container</span>
</span></span><span style="display:flex;"><span>    ip link del <span style="color:#e6db74">${</span>interface_name<span style="color:#e6db74">}</span> <span style="color:#f92672">||</span> true
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Delete existing interfaces in the host namespace</span>
</span></span><span style="display:flex;"><span>    nsenter -t1 -n ip link del <span style="color:#e6db74">${</span>interface_name<span style="color:#e6db74">}</span> <span style="color:#f92672">||</span> true
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Create the interface</span>
</span></span><span style="display:flex;"><span>    echo  <span style="color:#e6db74">&#34;Creating interface </span><span style="color:#e6db74">${</span>interface_name<span style="color:#e6db74">}</span><span style="color:#e6db74"> of type </span><span style="color:#e6db74">${</span>interface_type<span style="color:#e6db74">}</span><span style="color:#e6db74"> with mode </span><span style="color:#e6db74">${</span>interface_mode<span style="color:#e6db74">}</span><span style="color:#e6db74"> linked to </span><span style="color:#e6db74">${</span>src_interface<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>    nsenter -t1 -n ip link add <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>interface_name<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> link <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>src_interface<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> type <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>interface_type<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> mode <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>interface_mode<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">||</span> true
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Move the interface into the Pod container</span>
</span></span><span style="display:flex;"><span>    pid<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>echo $$<span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;Moving interface </span><span style="color:#e6db74">${</span>interface_name<span style="color:#e6db74">}</span><span style="color:#e6db74"> into container with PID </span><span style="color:#e6db74">${</span>pid<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>    nsenter -t1 -n ip link set <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>interface_name<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> netns <span style="color:#e6db74">${</span>pid<span style="color:#e6db74">}</span> <span style="color:#f92672">||</span> nsenter -t1 -n ip link delete <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>interface_name<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Bring up the interface</span>
</span></span><span style="display:flex;"><span>    ip link set dev <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>interface_name<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> up
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Set the IP address</span>
</span></span><span style="display:flex;"><span>    ip addr add 203.0.113.18/24 dev <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>interface_name<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> noprefixroute <span style="color:#f92672">||</span> true
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>src_interface<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>interface_type<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;macvlan&#34;</span>
</span></span><span style="display:flex;"><span>interface_mode<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;bridge&#34;</span>
</span></span><span style="display:flex;"><span>clean<span style="color:#f92672">=</span>false
</span></span><span style="display:flex;"><span><span style="color:#75715e"># s: means -s requires an argument</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># s:: means -s has an optional argument</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># s (without colon) means -s doesn&#39;t accept arguments</span>
</span></span><span style="display:flex;"><span>args<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>getopt -a -o s::ch --long src::,clean,help -- <span style="color:#e6db74">&#34;</span>$@<span style="color:#e6db74">&#34;</span><span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> $? -gt <span style="color:#ae81ff">0</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>usage
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>eval set -- <span style="color:#e6db74">${</span>args<span style="color:#e6db74">}</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">while</span> :
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">do</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">case</span> $1 in
</span></span><span style="display:flex;"><span>    -s | --src<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>      <span style="color:#75715e"># If $2 starts with &#39;-&#39; or is empty (--), it&#39;s not a value but another option</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> <span style="color:#e6db74">&#34;</span>$2<span style="color:#e6db74">&#34;</span> <span style="color:#f92672">==</span> <span style="color:#e6db74">&#34;--&#34;</span> <span style="color:#f92672">||</span> <span style="color:#e6db74">&#34;</span>$2<span style="color:#e6db74">&#34;</span> <span style="color:#f92672">==</span> -* <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>          src_interface<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>          shift
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>          src_interface<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span>$2<span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>          shift <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>      ;;
</span></span><span style="display:flex;"><span>    -c | --clean<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>      clean<span style="color:#f92672">=</span>true
</span></span><span style="display:flex;"><span>      shift ;;
</span></span><span style="display:flex;"><span>    -h | --help<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>      usage
</span></span><span style="display:flex;"><span>      exit <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>      shift ;;
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># -- means the end of the arguments; drop this, and break out of the while loop</span>
</span></span><span style="display:flex;"><span>    --<span style="color:#f92672">)</span> shift; break ;;
</span></span><span style="display:flex;"><span>    *<span style="color:#f92672">)</span> &gt;&amp;<span style="color:#ae81ff">2</span> echo Unsupported option: $1
</span></span><span style="display:flex;"><span>      usage ;;
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">esac</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">done</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> -z <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>src_interface<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    src_interface<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>nsenter -t1 -n ip route | awk <span style="color:#e6db74">&#39;/default/ {print $5}&#39;</span> | head -n1<span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>clean<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Delete existing interfaces in the container</span>
</span></span><span style="display:flex;"><span>    ip link del macvlandnsm <span style="color:#f92672">||</span> true
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Delete existing interfaces in the host namespace</span>
</span></span><span style="display:flex;"><span>    nsenter -t1 -n ip link del macvlandnsm <span style="color:#f92672">||</span> true
</span></span><span style="display:flex;"><span>    exit <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>main <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>src_interface<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>interface_type<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>interface_mode<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span></code></pre></div><p>Here is the current state of the boot:</p>
<figure>
    <img loading="lazy" src="uefi-boot.jpg"
         alt="A picture of a screen showing the Pi booted into the UEFI firmware. In the background, it shows the Raspberry Pi raspberry. At the bottom, several shortcuts are shown to enter setup, the shell or continue booting. At the top, text showing an attempt to do a PXE boot via IPv4 and IPv6 is displayed. In both cases, the remote boot failed."/> <figcaption>
            <p>The Pi successfully boots into the UEFI firmware.</p>
        </figcaption>
</figure>

<h2 id="getting-the-pi-to-execute-tinkerbells-ipxe-script">Getting the Pi to execute Tinkerbell&rsquo;s iPXE script</h2>
<p>It was very convenient to see that the UEFI firmware also attempts a PXE boot.
This allowed me to continue with pointing this stage of the boot to Tinkerbell&rsquo;s
iPXE binaries. For the most part, these are standard iPXE binary builds. The
only difference is that Tinkerbell introduced a user class setting to the PXE
requests the iPXE boot program will send, to make those requests easier to work
with.</p>
<p>Instructing the UEFI firmware to fetch the iPXE binary from Tinkerbell only needed
one additional setting in Dnsmasq:</p>
<pre tabindex="0"><code>pxe-service=ARM64_EFI,&#34;EFI Netboot&#34;,snp.efi,203.0.113.200
</code></pre><p>This line sets the boot file to <code>snp.efi</code> and instructs the iPXE firmware to
fetch it from Tinkerbell, not Dnsmasq. This is what the exchange looks like:</p>
<pre tabindex="0"><code>dnsmasq-dhcp[3309]: 3924602938 available DHCP subnet: 203.0.113.255/255.255.255.0
dnsmasq-dhcp[3309]: 3924602938 vendor class: PXEClient:Arch:00011:UNDI:003000
dnsmasq-dhcp[3309]: 3924602938 PXE(macvlandnsm) e4:5f:01:bc:f4:ce proxy
dnsmasq-dhcp[3309]: 3924602938 tags: macvlandnsm
dnsmasq-dhcp[3309]: 3924602938 bootfile name: snp.efi
dnsmasq-dhcp[3309]: 3924602938 server name: 203.0.113.200
dnsmasq-dhcp[3309]: 3924602938 next server: 203.0.113.200
dnsmasq-dhcp[3309]: 3924602938 sent size:  1 option: 53 message-type  5
dnsmasq-dhcp[3309]: 3924602938 sent size:  4 option: 54 server-identifier  203.0.113.18
dnsmasq-dhcp[3309]: 3924602938 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
dnsmasq-dhcp[3309]: 3924602938 sent size: 17 option: 97 client-machine-id  00:15:31:c0:00:00:00:00:00:00:00:e4:5f:01...
</code></pre><p>This got me a little bit further, but ended with the Pi dropping me into the
UEFI firmware screen:</p>
<figure>
    <img loading="lazy" src="uefi-fw-screen.jpg"
         alt="A picture of a screen showing the UEFI firmware config screen. Similar to an x86 UEFI menu, it shows information about the Pi like its CPU and RAM. Options include setting the language, and entering submenus for Device Manager, Boot Manager and Boot Maintenance Manager. It doesn&#39;t show any indication of why the UEFI menu is shown."/> <figcaption>
            <p>Instead of booting HookOS, I&rsquo;m ending up in the UEFI config menu.</p>
        </figcaption>
</figure>

<p>To fix the issue, I had to tell iPXE where to fetch the iPXE script, which I did
with the following lines in the Dnsmasq config:</p>
<pre tabindex="0"><code>dhcp-match=tinkerbell, option:user-class, Tinkerbell
pxe-service=tag:tinkerbell,ARM64_EFI,&#34;EFI Netboot IPXE&#34;,http://203.0.113.200/auto.ipxe
</code></pre><p>This had no effect at all, or at least that was what it looked like to me. I just
ended up on the same UEFI screen. But right before that, I saw flashes of an
error message, but wasn&rsquo;t able to really see it. After some vain attempts at
changing the <code>pxe-service</code> line, I gave in and connected a keyboard. Pressing
CTRL+b right after the iPXE binary started running, I got into an iPXE shell.
I then just ran the <code>autoboot</code> command and finally got my error: The DHCP
response was correct, iPXE was trying to fetch the iPXE script from the right
place, it seemed. But it got a &ldquo;Connection reset by peer&rdquo; error. And then it
dawned on me: Tinkerbell&rsquo;s HTTP server wasn&rsquo;t running on port 80. So the fix
was simple, I changed the two lines from above to these:</p>
<pre tabindex="0"><code>dhcp-match=tinkerbell, option:user-class, Tinkerbell
dhcp-boot=tag:tinkerbell,&#34;http://203.0.113.200:7171/auto.ipxe&#34;,,&#34;{{ .Values.tinkerbellIP }}&#34;
</code></pre><p>The switch from <code>pxe-service</code> to <code>dhcp-boot</code> was necessary because the iPXE script
was not requesting PXE options in its DHCP request, and consequently, Dnsmasq did
not send a PXE answer. Instead, iPXE was just expecting a boot file option being
set.</p>
<p>The <code>auto.ipxe</code> &ldquo;file&rdquo; is a clever implementation detail from Tinkerbell worth
talking about a bit. This file is an iPXE script, which can use the <a href="https://ipxe.org/cmd">iPXE commands</a>
running in batch mode. The script can be found <a href="https://github.com/tinkerbell/tinkerbell/blob/v0.18.3/smee/internal/ipxe/script/hook.go">here</a>.
Instead of delivering a static script of some sort, Tinkerbell dynamically generates
the iPXE script for each individual host. The script always does the same thing
in principle: It loads the kernel and initramfs and defines the kernel command
line and then boots into the kernel. But due to the dynamic nature, the kernel
and initramfs can be set individually for every host in the Hardware manifest.</p>
<h2 id="getting-tinkerbell-to-send-the-autoipxe-script">Getting Tinkerbell to send the auto.ipxe script</h2>
<p>At this point, I was getting errors from Tinkerbell, because I hadn&rsquo;t created a
Hardware object for the Pi yet. I created this one:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">tinkerbell.org/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Hardware</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">testpi</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">instance</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">id</span>: <span style="color:#ae81ff">e4:5f:01:bc:f4:ce</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ips</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">address</span>: <span style="color:#ae81ff">203.0.113.70</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allow_pxe</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">hostname</span>: <span style="color:#ae81ff">testpi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">operating_system</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">distro</span>: <span style="color:#e6db74">&#34;ubuntu&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">version</span>: <span style="color:#e6db74">&#34;24.04&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">disks</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">device</span>: <span style="color:#ae81ff">/dev/sda</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">interfaces</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">dhcp</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">arch</span>: <span style="color:#ae81ff">aarch64</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">hostname</span>: <span style="color:#ae81ff">testpi</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mac</span>: <span style="color:#ae81ff">e4:5f:01:bc:f4:ce</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ip</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">address</span>: <span style="color:#ae81ff">203.0.113.70</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">netmask</span>: <span style="color:#ae81ff">255.255.255.0</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name_servers</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">10.86.25.254</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">uefi</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">netboot</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allowPXE</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allowWorkflow</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">userData</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    #cloud-config
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    packages:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - openssh-server
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - python3
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - sudo
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    ssh_pwauth: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    disable_root: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    allow_public_ssh_keys: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    timezone: &#34;Europe/Berlin&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    users:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - name: imhotep
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        shell: /bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        ssh_authorized_keys:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - from=&#34;192.0.2.100&#34; ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOaxn8l16GNyBEgYzWO0BAko9fw8kkIq9tbels3hXdUt user@foo
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        sudo: ALL=(ALL:ALL) ALL
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    runcmd:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - systemctl enable ssh.service
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - systemctl start ssh.service
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    power_state:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      delay: 2
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      timeout: 2
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      mode: reboot</span>
</span></span></code></pre></div><p>Note specially the <code>spec.interfaces.netboot.allowPXE: false</code> option. This tells
Tinkerbell that it shouldn&rsquo;t be sending any answer to the host&rsquo;s DHCP requests
while PXE booting. I had to set the option, because by default, Tinkerbell
would answer the initial DHCP request with a DHCP reply instructing the Pi to
download the iPXE binary from Tinkerbell straight away. This works with normal
PXE boot, but the Pi&rsquo;s network boot is a bit special. It has to get stuff like
the <code>config.txt</code> file from the TFTP server as well, and Tinkerbell can&rsquo;t do that.
Yet. I will go into a bit more detail at the end.</p>
<p>But even with this config set, the <code>auto.ipxe</code> script was not getting delivered.
This time Tinkerbell output the following error message:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;time&#34;</span>:<span style="color:#e6db74">&#34;2025-06-26T19:02:24.911697105Z&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;0&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;caller&#34;</span>:<span style="color:#e6db74">&#34;smee/internal/ipxe/script/ipxe.go:169&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;the hardware data for this machine, or lack there of, does not allow it to pxe&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;service&#34;</span>:<span style="color:#e6db74">&#34;smee&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;client&#34;</span>:<span style="color:#e6db74">&#34;203.0.113.70:42502&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;error&#34;</span>:<span style="color:#66d9ef">null</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;time&#34;</span>:<span style="color:#e6db74">&#34;2025-06-26T19:02:24.911752728Z&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;0&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;caller&#34;</span>:<span style="color:#e6db74">&#34;smee/internal/ipxe/http/middleware.go:37&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;response&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;service&#34;</span>:<span style="color:#e6db74">&#34;smee&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;method&#34;</span>:<span style="color:#e6db74">&#34;GET&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;uri&#34;</span>:<span style="color:#e6db74">&#34;/auto.ipxe&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;client&#34;</span>:<span style="color:#e6db74">&#34;203.0.113.70&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;duration&#34;</span>:<span style="color:#ae81ff">160896</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;status&#34;</span>:<span style="color:#ae81ff">404</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>So because <code>allowPXE</code> is <code>false</code> for the Pi, it also doesn&rsquo;t get to download
the <code>auto.ipxe</code> script. My ultimate solution for this was to completely disable
DHCP for Tinkerbell and then setting <code>allowPXE: true</code> for the Pi.
I was able to disable DHCP completely with this setting in Tinkerbell&rsquo;s <code>values.yaml</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">deployment</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">envs</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">smee</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">dhcpEnabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>And after that, the Pi was able to boot into HookOS without further issue. I will talk
about why this is suboptimal in the last section of this post.</p>
<h2 id="provisioning-the-pi">Provisioning the Pi</h2>
<p>Setting up the actual provisioning went rather smoothly after all of that. I
followed the same approach as I did for the VM in the <a href="https://blog.mei-home.net/posts/tinkerbell-3-install-and-first-provisioning/">previous post</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">tinkerbell.org/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Template</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">pi-template</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    name: pi-template
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    version: &#34;0.1&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    global_timeout: 600
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    tasks:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - name: &#34;os installation&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        worker: &#34;{{`{{.machine_mac}}`}}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        volumes:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - /dev:/dev
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - /dev/console:/dev/console
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        actions:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - name: &#34;install ubuntu&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            image: quay.io/tinkerbell/actions/image2disk:latest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            timeout: 900
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            environment:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                IMG_URL: https://s3.example.com/public/images/mypi-image.img
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                DEST_DISK: /dev/sda
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                COMPRESSED: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - name: &#34;add cloud-init config&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            image: quay.io/tinkerbell/actions/writefile:latest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            timeout: 90
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            environment:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_DISK: {{ `{{ formatPartition ( index .Hardware.Disks 0 ) 2 }}` }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_PATH: /etc/cloud/cloud.cfg.d/10_tinkerbell.cfg
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DIRMODE: &#34;0700&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              FS_TYPE: ext4
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              GID: &#34;0&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              MODE: &#34;0600&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              UID: &#34;0&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              CONTENTS: |
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                datasource:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                  Ec2:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                    metadata_urls: [&#34;http://203.0.113.200:7172&#34;]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                    strict_id: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                manage_etc_hosts: localhost
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                warnings:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                  dsid_missing_source: off
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - name: &#34;add cloud-init ds-identity&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            image: quay.io/tinkerbell/actions/writefile:latest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            timeout: 90
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            environment:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_DISK: {{ `{{ formatPartition ( index .Hardware.Disks 0 ) 2 }}` }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              FS_TYPE: ext4
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_PATH: /etc/cloud/ds-identify.cfg
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              UID: 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              GID: 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              MODE: 0600
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DIRMODE: 0700
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              CONTENTS: |
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                datasource: Ec2
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - name: &#34;remove default user data&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            image: quay.io/tinkerbell/actions/writefile:latest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            timeout: 90
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            environment:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_DISK: {{ `{{ formatPartition ( index .Hardware.Disks 0 ) 1 }}` }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              FS_TYPE: vfat
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_PATH: /user-data
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              UID: 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              GID: 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              MODE: 0600
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DIRMODE: 0700
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              CONTENTS: |
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                # Removed during provisioning
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - name: &#34;remove default meta data&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            image: quay.io/tinkerbell/actions/writefile:latest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            timeout: 90
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            environment:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_DISK: {{ `{{ formatPartition ( index .Hardware.Disks 0 ) 1 }}` }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              FS_TYPE: vfat
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_PATH: /meta-data
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              UID: 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              GID: 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              MODE: 0600
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DIRMODE: 0700
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              CONTENTS: |
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                # Removed during provisioning
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - name: &#34;remove default network config&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            image: quay.io/tinkerbell/actions/writefile:latest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            timeout: 90
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            environment:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_DISK: {{ `{{ formatPartition ( index .Hardware.Disks 0 ) 1 }}` }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              FS_TYPE: vfat
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DEST_PATH: /network-config
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              UID: 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              GID: 0
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              MODE: 0600
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              DIRMODE: 0700
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              CONTENTS: |
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                # Removed during provisioning
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - name: &#34;reboot&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            image: ghcr.io/jacobweinstock/waitdaemon:latest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            timeout: 90
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            pid: host
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            command: [&#34;reboot&#34;]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            environment:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              IMAGE: alpine
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              WAIT_SECONDS: 10
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            volumes:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              - /var/run/docker.sock:/var/run/docker.sock</span>
</span></span></code></pre></div><p>The one noteworthy change here is in the files I&rsquo;m removing/emptying. With the
VM image, I had created that myself via the Ubuntu installer and Packer. But for
the Pi, I was able to use the official Ubuntu preinstalled Raspberry Pi image.
But that does have default <code>user-data</code> and <code>network-config</code> files as well. In
the Pi image, those are located on the boot partition:</p>
<ul>
<li><code>/user-data</code></li>
<li><code>/meta-data</code></li>
<li><code>/network-config</code></li>
</ul>
<p>With these files removed, the Ubuntu image properly made use of the metadata
server Tinkerbell provides and executed the <code>user-data</code> instructions delivered
by it and defined in the Hardware object of the Raspberry Pi.</p>
<p>So now I finally had a fully provisioned Pi, without any manual intervention:</p>
<figure>
    <img loading="lazy" src="pi-ubuntu-booted.jpg"
         alt="A picture of a screen showing the final lines of an Ubuntu boot. It shows the Ubuntu version as 24.02.2 LTS, some lines indicating that cloud-init ran successfully and finally a login prompt for the host testpi."/> <figcaption>
            <p>Final successful Ubuntu provisioned boot.</p>
        </figcaption>
</figure>

<h2 id="next-steps">Next steps</h2>
<p>The next phase of the Tinkerbell project will require me to don my thinking cap
and probably try to write some Go code. As I&rsquo;ve shown above, booting a Pi 4 is
possible. But provisioning a Pi 5 the same way is not. The reason for that is
that the Pi 5 UEFI project seems to be dead, looking at the <a href="https://github.com/worproject/rpi5-uefi">archived repo</a>.
Additionally, the approach I&rsquo;ve shown above requires DHCP to be completely
switched off in Tinkerbell, because I needed to enable <code>allowPXE</code> to get the
<code>auto.ipxe</code> script, but at the same time Tinkerbell cannot provide the files
necessary for the initial PXE boot into UEFI/iPXE for a Pi.</p>
<p>But there might be a way around all of these issues, which should also work with
the Pi 5: Booting into HookOS directly, skipping UEFI and iPXE. This should be
possible by setting HookOS&rsquo; kernel and initramfs in the <code>config.txt</code> file for
direct boot via the Pi&rsquo;s firmware. The downside of this approach is that I&rsquo;m losing
Tinkerbell&rsquo;s ability of adapting e.g. the kernel command line dynamically, as it
does when booting through the iPXE script. Tinkerbell would only enter the picture
after HookOS is already booted up.</p>
<p>Then there&rsquo;s also the issue with diskless hosts which netboot not only for their
initial provisioning via Tinkerbell, but instead would always netboot. The
biggest issue here is how to distinguish between the two. When the host needs to
be provisioned, it needs to be told to PXE boot into HookOS. If it is just doing
a normal boot, it needs to boot into its own kernel and initramfs. The best
decision point I can imagine for that are Tinkerbell&rsquo;s workflows. They can be in
different states, and they&rsquo;re set to a &ldquo;done&rdquo; state when all of their tasks have
been executed successfully for a given host. So whenever a DHCP request arrives,
I could check whether that host has any pending workflows. If it does, I tell
it to boot into HookOS, and otherwise I have it continue the boot normally.</p>
<p>Lots to think about. But I&rsquo;m enjoying it - there&rsquo;s certainly been a lot more &ldquo;lab&rdquo;
in my Homelab than usual. &#x1f601;</p>
]]></content:encoded>
    </item>
    <item>
      <title>Tinkerbell Part III: Install and First Provisioning</title>
      <link>https://blog.mei-home.net/posts/tinkerbell-3-install-and-first-provisioning/</link>
      <pubDate>Sat, 21 Jun 2025 20:30:01 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/tinkerbell-3-install-and-first-provisioning/</guid>
      <description>I deploy Tinkerbell on my k3s cluster and provision the first VM with it</description>
      <content:encoded><![CDATA[<p>In this post, I will describe how I deployed <a href="https://tinkerbell.org/">Tinkerbell</a>
into my k3s cluster and provisioned the first Ubuntu VM with it.</p>
<p>This is part 3 of my <a href="https://blog.mei-home.net/tags/series-tinkerbell/">Tinkerbell series</a>.</p>
<h2 id="deploying-tinkerbell">Deploying Tinkerbell</h2>
<p>The first step is to deploy Tinkerbell into the k3s cluster I set up in the
<a href="https://blog.mei-home.net/posts/tinkerbell-2-lab-setup/">previous post</a>. For this, I used
the official Helm chart, which can be found <a href="https://github.com/tinkerbell/tinkerbell/tree/main/helm/tinkerbell">here</a>.</p>
<p>My <code>values.yaml</code> file looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">publicIP</span>: <span style="color:#e6db74">&#34;203.0.113.200&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">trustedProxies</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;10.42.0.0/24&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">artifactsFileServer</span>: <span style="color:#e6db74">&#34;http://203.0.113.200:7173&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">deployment</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">envs</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">tinkController</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enableLeaderElection</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">smee</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">dhcpMode</span>: <span style="color:#e6db74">&#34;proxy&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">globals</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enableRufioController</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">enableSecondstar</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">logLevel</span>: <span style="color:#ae81ff">3</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">init</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">service</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">lbClass</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">optional</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hookos</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">service</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">lbClass</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">kernelVersion</span>: <span style="color:#e6db74">&#34;both&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">persistence</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">existingClaim</span>: <span style="color:#e6db74">&#34;hookos-volume&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kubevip</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">false</span>
</span></span></code></pre></div><p>The first setting, <code>publicIP</code>, is the public IP under which Tinkerbell&rsquo;s services
will be available to other machines. It will be used in DHCP responses for the
next server, download URLs for iPXE scripting and so forth. It will also be set
as the <code>loadBalancerIP</code> in the Service manifest created by the chart. In my
case, this is a VIP controlled by a kube-vip deployment I will go into more
detail on later. The <code>trustedProxies</code> entry is just the CIDR for Pods in
my k3s cluster. The <code>artifactsFileServer</code> is the address for the HookOS artifacts,
in this case the kernel and initrd. The Tinkerbell chart sets up a small Nginx
deployment for this and automatically downloads the newest HookOS artifacts to
it. This is configured under <code>optional.hookos</code>. I&rsquo;m also disabling a few things
because I don&rsquo;t intend to use them. One of those is leader elections for
Tinkerbell - as I will only have one deployment, those seem unnecessary. I disable
Rufio and SecondStar as well. Rufio is a component to talk to baseboard
management controllers usually found on enterprise equipment. As I don&rsquo;t have
any such gear, it&rsquo;s unnecessary. Finally, SecondStar is a serial over SSH service
I also don&rsquo;t need.</p>
<p>The <code>dhcpMode</code> of Smee, the DHCP and general netboot component of Tinkerbell,
is more interesting. DHCP servers, especially those providing netboot options,
sometimes need to coexist. Where one DHCP server does the general IP management,
handing out dynamic and static IPs as well as stuff like NTP and DNS servers.
And then there&rsquo;s a second DHCP server which only sends out DHCP information
necessary for PXE boot. Most normal DHCP servers can do that as well, I&rsquo;m
currently using <a href="https://thekelleys.org.uk/dnsmasq/doc.html">Dnsmasq</a> to boot
my diskless machines for example, while normal IP address management is done
by the ISC DHCP server running on my OPNsense router.
Smee supports similar modes. It can either do all of the DHCP in one, handing
out IPs and netboot information, or only hand out netboot info, or even don&rsquo;t
do anything with DHCP at all, but only serve iPXE binaries and scripts. The
different running modes are described in more detail <a href="https://github.com/tinkerbell/smee/blob/main/README.md#dhcp-modes">here</a>.
I&rsquo;m using the proxy mode because I&rsquo;ve already got a DHCP server handling
address management, although I might change that for the actual production
deployment. This is because I have to set the machine&rsquo;s static IP in the
Hardware manifest anyway, as I will explain later. And I just like the fact
that static IPs would then finally be under version control. Right now, they&rsquo;re
just configured in the OPNsense UI.</p>
<p>The <code>logLevel</code> option is more important than it seems. Without it, Tinkerbell
will keep a number of low priority errors/warnings to itself. These are the
kind of &ldquo;error&rdquo; which might appear during normal operation, like DHCP packets
arriving for hosts which Tinkerbell doesn&rsquo;t know about. But for me, it made
debugging my setup a bit more difficult. I will talk about that in the next
section.</p>
<p>I&rsquo;m also disabling the kube-vip service that the chart can deploy, and instead
deploy a separate one to have more control over the deployment.</p>
<h2 id="configuring-tinkerbell">Configuring Tinkerbell</h2>
<p>The goal of my first tests was to get a feel for how Tinkerbell ticks. So I
didn&rsquo;t start out with trying to install an OS, but just wanted to see how the
netboot and the Tinkerbell manifests work.</p>
<p>Before launching the VM, I created a couple of manifests for Tinkerbell. The
core of Tinkerbell is the Workflow. It connects a Template containing actions
to be executed with a Hardware representing a host. Here is my initial
configuration:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">tinkerbell.org/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Hardware</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-vm</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">disks</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">device</span>: <span style="color:#ae81ff">/dev/sda</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">interfaces</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">dhcp</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">arch</span>: <span style="color:#ae81ff">x86_64</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">hostname</span>: <span style="color:#ae81ff">test-vm</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mac</span>: <span style="color:#ae81ff">10</span>:<span style="color:#ae81ff">66</span>:<span style="color:#ae81ff">6a:07:8d:0d</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name_servers</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">203.0.113.250</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">uefi</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">netboot</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allowPXE</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allowWorkflow</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">tinkerbell.org/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Template</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-template</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    name: test-template
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    version: &#34;0.1&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    global_timeout: 600
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    tasks:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - name: &#34;os installation&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        worker: &#34;{{`{{.machine_mac}}`}}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        volumes:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - /dev:/dev
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - /dev/console:/dev/console
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        actions:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - name: &#34;echome&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            image: ghcr.io/jacobweinstock/waitdaemon:latest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            timeout: 600
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            pid: host
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            command:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              - echo &#34;Hello, this is {{ .machine_mac }}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              - echo &#34;Ending script here&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            environment:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              IMAGE: alpine
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              WAIT_SECONDS: 60
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            volumes:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">              - /var/run/docker.sock:/var/run/docker.sock</span>
</span></span><span style="display:flex;"><span>---
</span></span><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#e6db74">&#34;tinkerbell.org/v1alpha1&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Workflow</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-workflow</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">templateRef</span>: <span style="color:#ae81ff">test-template</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hardwareRef</span>: <span style="color:#ae81ff">test-vm</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">hardwareMap</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">machine_mac</span>: <span style="color:#ae81ff">10</span>:<span style="color:#ae81ff">66</span>:<span style="color:#ae81ff">6a:07:8d:0d</span>
</span></span></code></pre></div><p>Let&rsquo;s start with the Hardware manifest. It defines both, characteristics of the
machine as well as configuration for said machine. This controls both the DHCP
as well as the netboot options, also configuring whether the machine gets to
PXE boot and whether it gets to run workflows. The Hardware is documented in
more detail <a href="https://tinkerbell.org/docs/concepts/hardware/">here</a>. The Hardware
manifest has a lot more options, but for my tests, only these ones were
relevant.</p>
<p>Next is the Template. This specifies the actions to be executed. In this particular
example, I&rsquo;m only running a few simple <code>echo</code> command, as I was mostly interested
in how the netboot works. These Templates are not supposed to be machine-specific,
but instead are intended to be used by multiple workflows.</p>
<p>And finally, there&rsquo;s the Workflow itself. It specifies a Hardware, meaning a
host, and a Template to apply to that host.
The <code>hardwareMap</code> is a map of values to be made available in Templates, see my
use of the <code>machine_mac</code> in the Template to set the <code>worker</code> ID. One downside
of Tinkerbell at the moment is that only the <code>spec.disks</code> value is available
from the Hardware, but none of the others. Hence why I also had to add the
<code>machine_mac</code> in the Workflow&rsquo;s <code>hardwareMap</code>, instead of taking the value from
the <code>spec.interfaces[].dhcp</code> value.</p>
<p>To summarize what this configuration is supposed to achieve: When Tinkerbell
receives a DHCP request from a machine with the MAC address <code>10:66:6a:07:8d:0d</code>,
it will send it some netboot information, namely itself as the next server
option and an iPXE binary. That binary will fetch an iPXE script when executed
by the netbooting host, again from Tinkerbell. That script will then download
the kernel and initrd for the HookOS from Tinkerbell&rsquo;s Nginx deployment. When
those are booted up, they will launch the Tink worker in Docker and request
a workflow from Tinkerbell. It will get the <code>echome</code> action delivered and execute
that. Right now, that only runs a couple of echo commands.</p>
<p>But that did not work out as expected, at least initially.</p>
<h2 id="dhcp-problems">DHCP problems</h2>
<p>For my testing, I needed another VM. And it couldn&rsquo;t have a normal image,
because I wanted to ultimately install a fresh OS on it. Luckily, Incus supports
the <code>--empty</code> parameter to create a VM and root disk, but without setting up an
image. I launched my test VM like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>incus init test-vm --empty --vm -c limits.cpu<span style="color:#f92672">=</span><span style="color:#ae81ff">4</span> -c limits.memory<span style="color:#f92672">=</span>4GiB --profile base --profile disk-vms -d network,hwaddr<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;10:66:6a:07:8d:0d&#34;</span>
</span></span></code></pre></div><p>This command launches a VM with a 20 GB root disk which is empty. The VM also gets
4 GiB of RAM and 4 CPU cores. Then I&rsquo;m also hardcoding the MAC address of the
NIC. This was a later addition because I deleted the VM multiple times during
testing, and it getting a new MAC each time it was created got annoying because
I had to change the static DHCP lease and Tinkerbell config each time.</p>
<p>Then I launched the VM and saw - nothing. It tried to PXE boot, but did not get
any netboot info, so I got dropped into a UEFI shell. I looked over my configuration,
but couldn&rsquo;t find anything. So I ran a quick test, to see whether hitting port 67
made it into the Tinkerbell Pod:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>echo <span style="color:#e6db74">&#34;foo&#34;</span> | nc -u 203.0.113.200 <span style="color:#ae81ff">67</span>
</span></span></code></pre></div><p>And indeed, the packet seemed to reach Tinkerbell, as I saw this in the logs:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{<span style="color:#f92672">&#34;time&#34;</span>:<span style="color:#e6db74">&#34;2025-06-01T20:48:36.172709819Z&#34;</span>,<span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;0&#34;</span>,<span style="color:#f92672">&#34;caller&#34;</span>:<span style="color:#e6db74">&#34;smee/internal/dhcp/server/dhcp.go:62&#34;</span>,<span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;error parsing DHCPv4 request&#34;</span>,<span style="color:#f92672">&#34;service&#34;</span>:<span style="color:#e6db74">&#34;smee&#34;</span>,<span style="color:#f92672">&#34;err&#34;</span>:<span style="color:#e6db74">&#34;buffer too short at position 4: have 0 bytes, want 4 bytes&#34;</span>}
</span></span></code></pre></div><p>I wasn&rsquo;t sending a DHCP message, so it was understandable that Tinkerbell didn&rsquo;t
know what to do with it. So in principle, the ServiceLB of k3s was working. But
the DHCP packets did not. Next, I ran tcpdump on the VM running Tinkerbell to
see whether the DHCP packets even made it to the machine itself:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>tcpdump: verbose output suppressed, use -v<span style="color:#f92672">[</span>v<span style="color:#f92672">]</span>... <span style="color:#66d9ef">for</span> full protocol decode
</span></span><span style="display:flex;"><span>listening on enp5s0, link-type EN10MB <span style="color:#f92672">(</span>Ethernet<span style="color:#f92672">)</span>, snapshot length <span style="color:#ae81ff">262144</span> bytes
</span></span><span style="display:flex;"><span>23:02:42.984176 IP 0.0.0.0.bootpc &gt; 255.255.255.255.bootps: BOOTP/DHCP, Request from 10:66:6a:07:8d:0d <span style="color:#f92672">(</span>oui Unknown<span style="color:#f92672">)</span>, length <span style="color:#ae81ff">253</span>
</span></span><span style="display:flex;"><span>E...V...@.#..........D.C...4.....3.......................fj...................................................................................................................................................................................
</span></span><span style="display:flex;"><span>..........................c.Sc5..9...7.....
</span></span><span style="display:flex;"><span>23:02:42.984524 IP _gateway.bootps &gt; 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length <span style="color:#ae81ff">300</span>
</span></span><span style="display:flex;"><span>E..H.......B
</span></span><span style="display:flex;"><span>V.......C.D.4.......3..........
</span></span><span style="display:flex;"><span>V...........fj.............................................................................................................................................................................................................c.Sc5..6.
</span></span><span style="display:flex;"><span>V..3....T........
</span></span><span style="display:flex;"><span>V....
</span></span><span style="display:flex;"><span>V.............................
</span></span><span style="display:flex;"><span>23:02:46.363155 IP 0.0.0.0.bootpc &gt; 255.255.255.255.bootps: BOOTP/DHCP, Request from 10:66:6a:07:8d:0d <span style="color:#f92672">(</span>oui Unknown<span style="color:#f92672">)</span>, length <span style="color:#ae81ff">265</span>
</span></span><span style="display:flex;"><span>E..%V...@.#..........D.C...l.....3.......................fj...................................................................................................................................................................................
</span></span><span style="display:flex;"><span>..........................c.Sc5..6.
</span></span><span style="display:flex;"><span>V..2.
</span></span><span style="display:flex;"><span>V..9...7.....
</span></span><span style="display:flex;"><span>23:02:46.363507 IP _gateway.bootps &gt; 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length <span style="color:#ae81ff">300</span>
</span></span><span style="display:flex;"><span>E..H.......B
</span></span><span style="display:flex;"><span>V.......C.D.4.......3..........
</span></span><span style="display:flex;"><span>V...........fj.............................................................................................................................................................................................................c.Sc5..6.
</span></span><span style="display:flex;"><span>V..3....P........
</span></span><span style="display:flex;"><span>V....
</span></span><span style="display:flex;"><span>V.............................
</span></span><span style="display:flex;"><span><span style="color:#ae81ff">4</span> packets captured
</span></span><span style="display:flex;"><span><span style="color:#ae81ff">4</span> packets received by filter
</span></span><span style="display:flex;"><span><span style="color:#ae81ff">0</span> packets dropped by kernel
</span></span></code></pre></div><p>So yes, the packet at least arrived at the machine and on the right interface.
Running tcpdump in the network namespace of the Tinkerbell Pod showed no packet
arriving, though. So I dug a bit deeper into k3s&rsquo; ServiceLB and what it actually
does, and found this output in the logs:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>kmaster logs -n kube-system svclb-tinkerbell-01c2218a-p69fs -c lb-udp-67
</span></span><span style="display:flex;"><span>+ trap exit TERM INT
</span></span><span style="display:flex;"><span>+ BIN_DIR<span style="color:#f92672">=</span>/usr/sbin
</span></span><span style="display:flex;"><span>+ check_iptables_mode
</span></span><span style="display:flex;"><span>+ set +e
</span></span><span style="display:flex;"><span>+ lsmod
</span></span><span style="display:flex;"><span>+ grep -qF nf_tables
</span></span><span style="display:flex;"><span>+ <span style="color:#e6db74">&#39;[&#39;</span> <span style="color:#ae81ff">0</span> <span style="color:#e6db74">&#39;=&#39;</span> <span style="color:#ae81ff">0</span> <span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>+ mode<span style="color:#f92672">=</span>nft
</span></span><span style="display:flex;"><span>+ set -e
</span></span><span style="display:flex;"><span>+ info <span style="color:#e6db74">&#39;nft mode detected&#39;</span>
</span></span><span style="display:flex;"><span>+ set_nft
</span></span><span style="display:flex;"><span>+ ln -sf xtables-nft-multi /usr/sbin/iptables
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>INFO<span style="color:#f92672">]</span>  nft mode detected
</span></span><span style="display:flex;"><span>+ ln -sf xtables-nft-multi /usr/sbin/iptables-save
</span></span><span style="display:flex;"><span>+ ln -sf xtables-nft-multi /usr/sbin/iptables-restore
</span></span><span style="display:flex;"><span>+ ln -sf xtables-nft-multi /usr/sbin/ip6tables
</span></span><span style="display:flex;"><span>+ start_proxy
</span></span><span style="display:flex;"><span>+ echo 0.0.0.0/0
</span></span><span style="display:flex;"><span>+ grep -Eq :
</span></span><span style="display:flex;"><span>+ iptables -t filter -I FORWARD -s 0.0.0.0/0 -p UDP --dport <span style="color:#ae81ff">32562</span> -j ACCEPT
</span></span><span style="display:flex;"><span>+ echo 203.0.113.200
</span></span><span style="display:flex;"><span>+ grep -Eq :
</span></span><span style="display:flex;"><span>+ cat /proc/sys/net/ipv4/ip_forward
</span></span><span style="display:flex;"><span>+ <span style="color:#e6db74">&#39;[&#39;</span> <span style="color:#ae81ff">1</span> <span style="color:#e6db74">&#39;==&#39;</span> <span style="color:#ae81ff">1</span> <span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>+ iptables -t filter -A FORWARD -d 203.0.113.200/32 -p UDP --dport <span style="color:#ae81ff">32562</span> -j DROP
</span></span><span style="display:flex;"><span>+ iptables -t nat -I PREROUTING -p UDP --dport <span style="color:#ae81ff">67</span> -j DNAT --to 203.0.113.200:32562
</span></span><span style="display:flex;"><span>+ iptables -t nat -I POSTROUTING -d 203.0.113.200/32 -p UDP -j MASQUERADE
</span></span><span style="display:flex;"><span>+ <span style="color:#e6db74">&#39;[&#39;</span> <span style="color:#e6db74">&#39;!&#39;</span> -e /pause <span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>+ mkfifo /pause
</span></span></code></pre></div><p>What I <em>thought</em> I could read out of that setup was that only packets which are
directed to the exact IP of the host, <code>203.0.113.200</code>, would be forwarded to
the Tinkerbell Pod. But the initial DHCP discovery packets are of course send
to the broadcast address, as can be seen in the tcpdump from above. And so I
thought that these packets would simply get dropped, because they were not
addressed to the unicast address of the host. But I&rsquo;m no longer 100% sure about
that. Because in later testing, with kube-vip as the LoadBalancer instead of
ServiceLB, I got a similar result - no reaction by Tinkerbell in the logs. But:
I then figured out that I had the log level too low.</p>
<p>But at this point, I still thought that ServiceLB was the problem. So I decided
to disable it and instead deploy <a href="https://kube-vip.io/">kube-vip</a>. I&rsquo;ve already
got experience with it, as I&rsquo;m using it as the VIP provider for the k8s API in
my main cluster.</p>
<p>I deployed kube-vip with this Deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">DaemonSet</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-vip</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-vip</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-vip</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">hostNetwork</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">serviceAccountName</span>: <span style="color:#ae81ff">kube-vip</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">kube-vip</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">ghcr.io/kube-vip/kube-vip:v0.9.1</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">imagePullPolicy</span>: <span style="color:#ae81ff">IfNotPresent</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#ae81ff">manager</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">svc_enable</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_arp</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">vip_leaderelection</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">svc_election</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">add</span>:
</span></span><span style="display:flex;"><span>              - <span style="color:#ae81ff">NET_ADMIN</span>
</span></span><span style="display:flex;"><span>              - <span style="color:#ae81ff">NET_RAW</span>
</span></span><span style="display:flex;"><span>              - <span style="color:#ae81ff">SYS_TIME</span>
</span></span></code></pre></div><p>With this config, kube-vip will watch for LoadBalancer services and announce
their IP via ARP. I&rsquo;ve disabled all leader elections, as this k3s cluster
will only ever have a single node.
Kube-vip does not have any IPAM functionality, it either relies on annotations
on the Service or the <code>loadBalancerIP</code> setting. The Tinkerbell chart already
sets the <code>loadBalancerIP</code> to the <code>publicIP</code> value from the <code>values.yaml</code> file,
so I just relied on that.</p>
<p>But that did not seem to fix my problem. There still wasn&rsquo;t any reaction from
Tinkerbell to the DHCP requests. Which was when I finally realized that I had
never increased Tinkerbell&rsquo;s log level. &#x1f926;
And that was when I finally got some results:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;time&#34;</span>:<span style="color:#e6db74">&#34;2025-06-07T22:04:38.322503545Z&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;level&#34;</span>:<span style="color:#e6db74">&#34;-1&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;caller&#34;</span>:<span style="color:#e6db74">&#34;smee/internal/dhcp/handler/proxy/proxy.go:211&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;msg&#34;</span>:<span style="color:#e6db74">&#34;Ignoring packet&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;service&#34;</span>:<span style="color:#e6db74">&#34;smee&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;mac&#34;</span>:<span style="color:#e6db74">&#34;10:66:6a:07:8d:0d&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;xid&#34;</span>:<span style="color:#e6db74">&#34;0xfd39e0af&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;interface&#34;</span>:<span style="color:#e6db74">&#34;macvlan0&#34;</span>,
</span></span><span style="display:flex;"><span><span style="color:#f92672">&#34;error&#34;</span>:<span style="color:#e6db74">&#34;failed to convert hardware to DHCP data: no IP data&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>I didn&rsquo;t have time to dig deeper into that error at the time, but did create
<a href="https://github.com/tinkerbell/tinkerbell/issues/197">this issue</a>, requesting
that the above error message be increased in log level, so it appears with the
standard logging setting. But it turned out that I had actually run into a bug.
My Hardware manifest was okay, but Tinkerbell erroneously required some IP
configuration. This has now been fixed.</p>
<h2 id="first-successful-boot">First successful boot</h2>
<p>And with that fix, I finally got my first successful netboot:
<figure>
    <img loading="lazy" src="first-hookos-boot.png"
         alt="A screenshot of a Linux terminal. It shows the command prompt after a fresh boot. The initial text welcomes the user to HookOS, Tinkerbell&#39;s boot in-memory OS. The output also indicates that the OS is based on LinuxKit and the 5.10 kernel. Furthermore, it informs the user that the &#39;docker&#39; command can be used to access tink&#39;s worker container."/> <figcaption>
            <p>Screenshot of my first successful HookOS network boot.</p>
        </figcaption>
</figure>
</p>
<p>So that was pretty nice to see. But there was something even better going on in
the background. First of all, the two <code>echo</code> commands I had configured to be run
as tasks upon boot did run. But the cool thing was how I was able to verify that.
It turns out that Tinkerbell launches a syslog server and configures the in-memory
HookOS in such a way that it would forward the logs to Tinkerbell. And Tinkerbell
then spits them out in its own logs. This is a really nice and convenient feature
for seeing what&rsquo;s happening on the remote machine.</p>
<h2 id="side-quest-generating-an-ubuntu-image">Side Quest: Generating an Ubuntu image</h2>
<p>The obvious next step was to install an entire OS instead of just outputting some
text. But for that, I first needed a new image. My current image pipeline produces
individual images for each host, which is clumsy and should be unnecessary.
Something like cloud-init should be able to do all of the initial setup I need
to prepare for Ansible management. I did not want to just use Ubuntu&rsquo;s cloud
images, and instead create my own.</p>
<p>Initially, I looked at <a href="https://github.com/canonical/ubuntu-image">ubuntu-image</a>.
That&rsquo;s the tool that&rsquo;s used by Canonical to produce the official Ubuntu images.
But it went a bit too deep for me, and I wasn&rsquo;t able to really grok how it worked.
In addition, while the current image was for an x86 VM with a local disk, I would
also need images for Raspberry Pis without any local storage. And those would
definitely need some adaptions, as they need a special initramfs. It didn&rsquo;t look
like that would be easily possible with ubuntu-image, so I would have to use
Packer/Ansible for those. In the end, I would have different tools for different
images, which I didn&rsquo;t really like.</p>
<p>So I decided to stay with my Packer approach. One problem with my current approach
was that it reboots the image after installation and runs Ansible on it. And when
using cloud-init, that would count as the first boot, so the first boot after
actually installing the image would not run cloud-init again. But it should. So
I looked for a way to disable provisioning, and found it in <a href="https://github.com/hashicorp/packer/issues/1591">this issue</a>.</p>
<p>My <a href="https://developer.hashicorp.com/packer">HashiCorp Packer</a> file looks like
this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">locals</span> {
</span></span><span style="display:flex;"><span>  ubuntu-major <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;24.04&#34;</span>
</span></span><span style="display:flex;"><span>  ubuntu-minor <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;2&#34;</span>
</span></span><span style="display:flex;"><span>  ubuntu-arch <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;amd64&#34;</span>
</span></span><span style="display:flex;"><span>  out_dir <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu-base&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">local</span> <span style="color:#e6db74">&#34;img-name&#34;</span> {
</span></span><span style="display:flex;"><span>  expression <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu-base-${local.ubuntu-major}.${local.ubuntu-minor}-${local.ubuntu-arch}&#34;</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">local</span> <span style="color:#e6db74">&#34;s3-access&#34;</span> {
</span></span><span style="display:flex;"><span>  expression <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault</span>(<span style="color:#e6db74">&#34;secret/s3-creds&#34;, &#34;access&#34;</span>)
</span></span><span style="display:flex;"><span>  sensitive <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">local</span> <span style="color:#e6db74">&#34;s3-secret&#34;</span> {
</span></span><span style="display:flex;"><span>  expression <span style="color:#f92672">=</span> <span style="color:#66d9ef">vault</span>(<span style="color:#e6db74">&#34;secret/s3-creds&#34;, &#34;secret&#34;</span>)
</span></span><span style="display:flex;"><span>  sensitive <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">source</span> <span style="color:#e6db74">&#34;qemu&#34; &#34;ubuntu-base&#34;</span> {
</span></span><span style="display:flex;"><span>  iso_url           <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;https://releases.ubuntu.com/${local.ubuntu-major}/ubuntu-${local.ubuntu-major}.${local.ubuntu-minor}-live-server-${local.ubuntu-arch}.iso&#34;</span>
</span></span><span style="display:flex;"><span>  iso_checksum      <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;sha256:d6dab0c3a657988501b4bd76f1297c053df710e06e0c3aece60dead24f270b4d&#34;</span>
</span></span><span style="display:flex;"><span>  output_directory  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu-base&#34;</span>
</span></span><span style="display:flex;"><span>  shutdown_command  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>  shutdown_timeout  <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;1h&#34;</span>
</span></span><span style="display:flex;"><span>  disk_size         <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;8G&#34;</span>
</span></span><span style="display:flex;"><span>  cpus              <span style="color:#f92672">=</span> <span style="color:#ae81ff">6</span>
</span></span><span style="display:flex;"><span>  memory            <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;4096&#34;</span>
</span></span><span style="display:flex;"><span>  format            <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;raw&#34;</span>
</span></span><span style="display:flex;"><span>  accelerator       <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;kvm&#34;</span>
</span></span><span style="display:flex;"><span>  firmware          <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/usr/share/edk2-ovmf/OVMF_CODE.fd&#34;</span>
</span></span><span style="display:flex;"><span>  net_device        <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;virtio-net&#34;</span>
</span></span><span style="display:flex;"><span>  disk_interface    <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;virtio&#34;</span>
</span></span><span style="display:flex;"><span>  communicator      <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;none&#34;</span>
</span></span><span style="display:flex;"><span>  vm_name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${local.img-name}&#34;</span>
</span></span><span style="display:flex;"><span>  http_content      <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    &#34;/user-data&#34; <span style="color:#f92672">=</span> <span style="color:#66d9ef">file</span>(<span style="color:#e6db74">&#34;${path.root}/files/ubuntu-base-autoinstall&#34;</span>)
</span></span><span style="display:flex;"><span>    &#34;/meta-data&#34; <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>  boot_command <span style="color:#f92672">=</span> [&#34;&lt;wait&gt;e&lt;wait5&gt;&#34;, &#34;&lt;down&gt;&lt;wait&gt;&lt;down&gt;&lt;wait&gt;&lt;down&gt;&lt;wait2&gt;&lt;end&gt;&lt;wait5&gt;&#34;, &#34;&lt;bs&gt;&lt;bs&gt;&lt;bs&gt;&lt;bs&gt;&lt;wait&gt;autoinstall ds<span style="color:#f92672">=</span>nocloud-net\\;s<span style="color:#f92672">=</span><span style="color:#66d9ef">http</span><span style="color:#960050;background-color:#1e0010">://</span>{{ .<span style="color:#66d9ef">HTTPIP</span> }}<span style="color:#960050;background-color:#1e0010">:</span>{{ .<span style="color:#66d9ef">HTTPPort</span> }}<span style="color:#960050;background-color:#1e0010">/</span> <span style="color:#960050;background-color:#1e0010">---&lt;</span><span style="color:#66d9ef">wait</span><span style="color:#960050;background-color:#1e0010">&gt;&lt;</span><span style="color:#66d9ef">f10</span><span style="color:#960050;background-color:#1e0010">&gt;&#34;</span>]
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">build</span> {
</span></span><span style="display:flex;"><span>  name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;ubuntu-base-${local.ubuntu-major}.${local.ubuntu-minor}-${local.ubuntu-arch}&#34;</span>
</span></span><span style="display:flex;"><span>  sources <span style="color:#f92672">=</span> [<span style="color:#e6db74">&#34;source.qemu.ubuntu-base&#34;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">post</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">processor</span> <span style="color:#e6db74">&#34;shell-local&#34;</span> {
</span></span><span style="display:flex;"><span>    script <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${path.root}/scripts/s3-upload.sh&#34;</span>
</span></span><span style="display:flex;"><span>    environment_vars <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>      &#34;OUT_DIR<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">abspath</span>(<span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">out_dir</span>)<span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;OUT_NAME<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">img-name</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;RCLONE_CONFIG_CEPHS3_PROVIDER<span style="color:#f92672">=</span><span style="color:#66d9ef">Ceph</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;RCLONE_CONFIG_CEPHS3_TYPE<span style="color:#f92672">=</span><span style="color:#66d9ef">s3</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;RCLONE_CONFIG_CEPHS3_ACCESS_KEY_ID<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">s3-access</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;RCLONE_CONFIG_CEPHS3_SECRET_ACCESS_KEY<span style="color:#f92672">=</span><span style="color:#e6db74">${</span><span style="color:#960050;background-color:#1e0010">local</span>.<span style="color:#960050;background-color:#1e0010">s3-secret</span><span style="color:#e6db74">}</span><span style="color:#960050;background-color:#1e0010">&#34;</span>,
</span></span><span style="display:flex;"><span>      &#34;RCLONE_CONFIG_CEPHS3_ENDPOINT<span style="color:#f92672">=</span><span style="color:#66d9ef">https</span><span style="color:#960050;background-color:#1e0010">://</span><span style="color:#66d9ef">s3</span>.<span style="color:#66d9ef">example</span>.<span style="color:#66d9ef">com</span><span style="color:#960050;background-color:#1e0010">&#34;</span>
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>This Packer file starts out with downloading the current Ubuntu 24.04.2 Server LTS
install image. It then uses Packer&rsquo;s <a href="https://developer.hashicorp.com/packer/integrations/hashicorp/qemu/latest/components/builder/qemu">Qemu plugin</a>
to launch a VM on the machine where the Packer build is executed.
The way the automation works is always pretty funny to me. See the <code>boot_commnd</code>
parameter above. Packer just takes control of the keyboard and types in what
you&rsquo;d type in to run an Ubuntu autoinstall. The small HTTP server used to
supply the <code>user-data</code> is automatically started by Packer and made available to
the VM. This file uses Ubuntu&rsquo;s <a href="https://canonical-subiquity.readthedocs-hosted.com/en/latest/index.html">autoinstall</a>
to automate the installation:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#75715e">#cloud-config</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">autoinstall</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">version</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">identity</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">hostname</span>: <span style="color:#e6db74">&#34;ubuntu-base&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">password</span>: <span style="color:#e6db74">&#34;$6$exDY1mhS4KUYCE/2$zmn9ToZwTKLhCw.b4/b.ZRTIZM30JZ4QrOQ2aOXJ8yk96xpcCof0kxKwuX1kqLG/ygbJ1f8wxED22bTL4F46P0&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">username</span>: <span style="color:#ae81ff">ubuntu</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">locale</span>: <span style="color:#ae81ff">en_US.UTF-8</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">source</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">id</span>: <span style="color:#ae81ff">ubuntu-server-minimal</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">storage</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">layout</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name</span>: <span style="color:#ae81ff">direct</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ssh</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">install-server</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">late-commands</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">echo &#39;ubuntu ALL=(ALL) NOPASSWD:ALL&#39; &gt; /target/etc/sudoers.d/sysuser</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">shutdown</span>: <span style="color:#ae81ff">poweroff</span>
</span></span></code></pre></div><p>Not that much configuration is necessary here. I create the <code>ubuntu</code> user here
just as an escape hatch, so that when something goes wrong with later provisioning
steps, I still have a way to get into the machine. It&rsquo;s removed in the first
steps of my Homelab Ansible playbook.</p>
<p>As I&rsquo;ve noted above, I don&rsquo;t need any additional customization here, the plan
was to create a really generic and small image I could then customize once it
was installed on a machine.</p>
<p>The last interesting part is the post-processor in the Packer file. Here, I
wrote a little script that uploads the finished image to my S3 storage, so
Tinkerbell has a place to install it from. This is what the <code>s3-upload.sh</code>
script looks like:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#75715e">#!/bin/sh
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> ! command -v rclone &gt; /dev/null; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>  echo <span style="color:#e6db74">&#34;Command rclone not found, aborting.&#34;</span>
</span></span><span style="display:flex;"><span>  exit <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>image<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>OUT_DIR<span style="color:#e6db74">}</span><span style="color:#e6db74">/</span><span style="color:#e6db74">${</span>OUT_NAME<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> ! -f <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>image<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>  echo <span style="color:#e6db74">&#34;Could not find image &#39;</span><span style="color:#e6db74">${</span>image<span style="color:#e6db74">}</span><span style="color:#e6db74">&#39;, aborting.&#34;</span>
</span></span><span style="display:flex;"><span>  exit <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span><span style="display:flex;"><span>echo <span style="color:#e6db74">&#34;Copying </span><span style="color:#e6db74">${</span>image<span style="color:#e6db74">}</span><span style="color:#e6db74">...&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>env
</span></span><span style="display:flex;"><span>rclone copy <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>image<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> cephs3:public/images/ <span style="color:#f92672">||</span> exit <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>exit <span style="color:#ae81ff">0</span>
</span></span></code></pre></div><p>It uses <a href="https://rclone.org/">rclone</a> to upload the image file to S3. One advantage
of starting out with a generic image is that it doesn&rsquo;t contain any secrets or
credentials, so there&rsquo;s no problem with putting it on a (internally) public S3
bucket.
The credentials for the S3 upload are taken from Vault via Packer&rsquo;s integration
in the <code>s3-access</code> and <code>s3-secret</code> variables at the beginning of the Packer file.</p>
<h2 id="provisioning-the-vm-via-tinkerbell">Provisioning the VM via Tinkerbell</h2>
<p>And now finally, I was ready to fully provision a VM with Tinkerbell. This
requires an update of the Tinkerbell Template, which now looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">tinkerbell.org/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Template</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-template</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">data</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    name: test-template
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    version: &#34;0.1&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    global_timeout: 600
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    tasks:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - name: &#34;os installation&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        worker: &#34;{{`{{.machine_mac}}`}}&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        volumes:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - /dev:/dev
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - /dev/console:/dev/console
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        actions:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - name: &#34;install ubuntu&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            image: quay.io/tinkerbell/actions/image2disk:latest
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            timeout: 900
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">            environment:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                IMG_URL: {{ .Values.images.ubuntuBaseAmd64 }}
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                DEST_DISK: /dev/sda
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">                COMPRESSED: false</span>
</span></span></code></pre></div><p>And that just worked, right out of the box. The Tinkerbell <code>image2disk</code> action
downloaded the image from S3 and automatically put it onto the VM&rsquo;s local disk.</p>
<p>And just like that, I had a fully deployed VM, provisioned via Tinkerbell. &#x1f389;</p>
<p>But not so fast. Of course, the first thing missing here was a proper cloud-init
config to set up my standard Ansible user so I could run my standard playbook.
<a href="https://cloud-init.io/">Cloud-init</a> can download configurations for the initial
boot from a cloud provider, codified in the <code>user-data</code> and <code>vendor-data</code>.
It runs in several phases during boot, first, before the network is available,
from local config files. And then, afterwards, from <code>user-data</code> provided e.g.
by the cloud provider via a HTTP server. The <code>user-data</code> and <code>vendor-data</code> can
also be provided from local files entirely. There&rsquo;s a wide range of configurations
that can be done via cloud-init. From creating local users and installing packages
to configuring mounts and networking.</p>
<p>To supply this cloud-init data, Tinkerbell has the <a href="https://tinkerbell.org/docs/services/tootles/">Tootles</a>
component. It implements AWS&rsquo; EC2 metadata service API, which is also supported
by cloud-init. The metadata reported by Tootles for any given instance is
supplied via the Hardware object:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">tinkerbell.org/v1alpha1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Hardware</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">test-vm</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">instance</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">id</span>: <span style="color:#ae81ff">10</span>:<span style="color:#ae81ff">66</span>:<span style="color:#ae81ff">6a:5a:91:8c</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ips</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">address</span>: <span style="color:#ae81ff">203.0.113.20</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allow_pxe</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">hostname</span>: <span style="color:#ae81ff">test-vm</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">operating_system</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">distro</span>: <span style="color:#e6db74">&#34;ubuntu&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">version</span>: <span style="color:#e6db74">&#34;24.04&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">disks</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">device</span>: <span style="color:#ae81ff">/dev/sda</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">interfaces</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">dhcp</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">arch</span>: <span style="color:#ae81ff">x86_64</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">hostname</span>: <span style="color:#ae81ff">test-vm</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">mac</span>: <span style="color:#ae81ff">10</span>:<span style="color:#ae81ff">66</span>:<span style="color:#ae81ff">6a:5a:91:8c</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ip</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">address</span>: <span style="color:#ae81ff">203.0.113.20</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">netmask</span>: <span style="color:#ae81ff">255.255.255.0</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">name_servers</span>:
</span></span><span style="display:flex;"><span>      - <span style="color:#ae81ff">10.86.25.254</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">uefi</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">netboot</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allowPXE</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">allowWorkflow</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">userData</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    #cloud-config
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    packages:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - openssh-server
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - python3
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - sudo
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    ssh_pwauth: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    disable_root: true
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    allow_public_ssh_keys: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    timezone: &#34;Europe/Berlin&#34;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    users:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - name: ansible-user
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        shell: /bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        ssh_authorized_keys:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          - from=&#34;192.0.2.100&#34; ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOaxn8l16GNyBEgYzWO0BAko9fw8kkIq9tbels3hXdUt user@foo
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        sudo: ALL=(ALL:ALL) ALL
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    runcmd:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - systemctl enable ssh.service
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      - systemctl start ssh.service
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">    power_state:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      delay: 2
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      timeout: 2
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      mode: reboot</span>
</span></span></code></pre></div><p>The first change necessary here is to add the <code>spec.interfaces[].dhcp.ip</code> section.
This is one of the suboptimal pieces of Tinkerbell. I&rsquo;m not actually having
Tinkerbell do the IPAM part of DHCP, that&rsquo;s still left to my OPNsense router.
But I still needed to specify the VM&rsquo;s IP here, because the EC2 API, and thus
Tootles, determines which metadata to return by the IP the request is coming
from. So if you just do a request for <code>/2009-04-04/meta-data</code> from any host,
you won&rsquo;t get a response. The request needs to come from an IP which has a
Hardware object. Another downside is that the <code>spec.metadata</code> section needs to
be defined manually - it&rsquo;s not automatically created from the rest of the Hardware
object.</p>
<p>Then we come to the actually interesting part, the <code>spec.userData</code>. This is the
cloud-init config returned to the machine upon request. As I&rsquo;ve noted above,
the main goal here is to configure the new machine so I can run my main Ansible
playbook on it. I&rsquo;m making sure that my Ansible user exists, has my SSH key
and is in the sudoers file. In addition, I&rsquo;m making sure that SSH is started
and then finally reboot the machine. The <code>#cloud-init</code> comment is load-bearing
by the way, without it cloud-init won&rsquo;t accept the configuration.</p>
<p>So far so good, but this configuration still did not work. The central issue was
that the machine did not have proper networking config. The <code>ip addr</code> command
was showing the Ethernet interface being down. This confused me, because the
cloud-init config clearly states that, when there&rsquo;s no explicit network config
given, a default using DHCP for all interfaces will be applied.</p>
<p>So I went searching. And that wasn&rsquo;t easy, because it turns out that Ubuntu&rsquo;s
server-minimal install is so minimal that it even eschews vi or nano. I had to
look at files with <code>cat</code>. But I was finally able to find what I was looking for.
In <code>/etc/netplan/50-cloud-init.yaml</code>, I found this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">network</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">version</span>: <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ethernets</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">ens3</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">dhcp4</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>That file was created by the installer during the Packer install run. But of
course, the NIC had a different name in that environment than it has on the
final VM.
To remedy this, I added another task to the Tinkerbell Template, removing the
cloud-init config created by the installer so that the defaults apply:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;remove installer network config&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">image</span>: <span style="color:#ae81ff">quay.io/tinkerbell/actions/writefile:latest</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">timeout</span>: <span style="color:#ae81ff">90</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">DEST_DISK</span>: {{ <span style="color:#ae81ff">`{{ formatPartition ( index .Hardware.Disks 0 ) 2 }}` }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">FS_TYPE</span>: <span style="color:#ae81ff">ext4</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">DEST_PATH</span>: <span style="color:#ae81ff">/etc/cloud/cloud.cfg.d/90-installer-network.cfg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">UID</span>: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">GID</span>: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">MODE</span>: <span style="color:#ae81ff">0600</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">DIRMODE</span>: <span style="color:#ae81ff">0700</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">CONTENTS</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      # Removed during provisioning</span>
</span></span></code></pre></div><p>This task is executed after the image is dd&rsquo;d onto the disk, mounts the root
partition and overrides the file content with a comment.</p>
<p>But even after that, I was still not getting my cloud-init user-config applied.
After some more searching, I found the file <code>/run/cloud/init/cloud-init-generator.log</code>
with the following content:</p>
<pre tabindex="0"><code>ds-identify rc=1
cloud-init is enabled but no datasource found, disabling
</code></pre><p>I could have avoided this problem by following Tinkerbell&rsquo;s <a href="https://tinkerbell.org/docs/integrations/cloudinit/">cloud-init docs</a>.
There, the example contains two more tasks:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;add cloud-init config&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">image</span>: <span style="color:#ae81ff">quay.io/tinkerbell/actions/writefile:latest</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">timeout</span>: <span style="color:#ae81ff">90</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">DEST_DISK</span>: {{ <span style="color:#ae81ff">`{{ formatPartition ( index .Hardware.Disks 0 ) 2 }}` }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">DEST_PATH</span>: <span style="color:#ae81ff">/etc/cloud/cloud.cfg.d/10_tinkerbell.cfg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">DIRMODE</span>: <span style="color:#e6db74">&#34;0700&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">FS_TYPE</span>: <span style="color:#ae81ff">ext4</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">GID</span>: <span style="color:#e6db74">&#34;0&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">MODE</span>: <span style="color:#e6db74">&#34;0600&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">UID</span>: <span style="color:#e6db74">&#34;0&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">CONTENTS</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      datasource:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        Ec2:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          metadata_urls: [&#34;http://203.0.113.200:7172&#34;]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">          strict_id: false
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      manage_etc_hosts: localhost
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      warnings:
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">        dsid_missing_source: off</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#e6db74">&#34;add cloud-init ds-identity&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">image</span>: <span style="color:#ae81ff">quay.io/tinkerbell/actions/writefile:latest</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">timeout</span>: <span style="color:#ae81ff">90</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">DEST_DISK</span>: {{ <span style="color:#ae81ff">`{{ formatPartition ( index .Hardware.Disks 0 ) 2 }}` }}</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">FS_TYPE</span>: <span style="color:#ae81ff">ext4</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">DEST_PATH</span>: <span style="color:#ae81ff">/etc/cloud/ds-identify.cfg</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">UID</span>: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">GID</span>: <span style="color:#ae81ff">0</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">MODE</span>: <span style="color:#ae81ff">0600</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">DIRMODE</span>: <span style="color:#ae81ff">0700</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">CONTENTS</span>: |<span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">      datasource: Ec2</span>
</span></span></code></pre></div><p>The first task adds some basic cloud-init configuration. Most importantly, the
URL for the metadata service. For most cloud providers, this is a hardcoded IP
over their entire cloud, but here it will be Tinkerbell&rsquo;s public IP as configured
in the Helm chart&rsquo;s <code>values.yaml</code>. Another important setting is to hardcode
the data source as <code>Ec2</code>, because cloud-init&rsquo;s default search mechanism checks
the aforementioned default IP, where it won&rsquo;t find any metadata service in my
Homelab.</p>
<p>With all of this configuration done, I was able to delete the VM one last time,
reset the Workflow object of Tinkerbell, and recreate the VM. After a couple of
minutes, I was greeted with a fully functional VM, ready for Ansible, with no
further manual intervention from my side.</p>
<h2 id="final-thoughts">Final thoughts</h2>
<p>I really like what I&rsquo;ve seen from Tinkerbell so far. I also like how well
cloud-init works. Even if I don&rsquo;t end up deploying Tinkerbell, I will likely
change my new host setup to use a generic image and then do the customization
with cloud-init.</p>
<p>The next steps will be the more complicated ones. There are two basic things
I will need to figure out. First, how to boot Raspberry Pi 4 and 5 into iPXE so
I can use Tinkerbell for provisioning them. From some initial research, it looks
like that should be possible. The bigger issue might be diskless hosts. Sure,
I can set up iPXE and provisioning - but the problem is then how to tell them
to boot into their own system, instead of Tinkerbell&rsquo;s provisioning, once they&rsquo;ve
been properly set up.</p>
<p>Let&rsquo;s see how those next experiments turn out.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Tinkerbell Part II: Lab Setup</title>
      <link>https://blog.mei-home.net/posts/tinkerbell-2-lab-setup/</link>
      <pubDate>Thu, 12 Jun 2025 00:30:11 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/tinkerbell-2-lab-setup/</guid>
      <description>Setting up the lab for Tinkerbell</description>
      <content:encoded><![CDATA[<p>A description of my lab setup for tinkering with Tinkerbell.</p>
<p>This is part 2 of my <a href="https://blog.mei-home.net/tags/series-tinkerbell/">Tinkerbell series</a>.</p>
<p><del>For my Tinkerbell tinkering lab</del> Actually, no. Let&rsquo;s start with: How
did I not come up with &ldquo;tinkering with Tinkerbell&rdquo; until the second post
of this series? You may tsk tsk tsk disapprovingly at your screen now.</p>
<p>For my Tinkerbell tinkering lab, I decided to run it on my desktop machine.
This is because previous work on network booting has shown that I definitely want
direct access to the netbooting machine&rsquo;s TTY. And that&rsquo;s easiest when it runs
on my desktop. Also makes stuff like packet capturing easier. So I needed the
following things in my lab setup:</p>
<ol>
<li>Fresh VLAN</li>
<li>VM tooling on my desktop</li>
<li>Ubuntu server VM for Tinkerbell</li>
<li>k3s, to run Tinkerbell</li>
</ol>
<p>In this post, I will go into a bit more detail on what that setup looks like.</p>
<h2 id="new-vlan">New VLAN</h2>
<p>In my Homelab VLAN, I&rsquo;ve already got two DHCP servers. One is from my OPNsense
router, providing the IPAM (IP Address Management) side of things. Then there&rsquo;s
also a <a href="https://thekelleys.org.uk/dnsmasq/doc.html">dnsmasq</a> instance running in
proxy mode and supplying the necessary info for netbooting, also serving as a
TFTP server.</p>
<p>This is definitely something I will need to tackle during the labbing phase - what
to do about diskless netbooting machines? For their first boot, they should go
with Tinkerbell for initial provisioning. But all subsequent boots should then
use the dnsmasq server and boot their normal kernel.</p>
<p>But for now, I&rsquo;m avoiding having to think about this by creating a separate VLAN
so Tinkerbell&rsquo;s DHCP doesn&rsquo;t disrupt the netbooting hosts. If you&rsquo;re curious about
the details, head to <a href="https://blog.mei-home.net/posts/vlans/">this post</a>. For now, suffice
it to say that I configured another fresh VLAN, let&rsquo;s say with the ID <code>512</code>, and
added it as a trunk VLAN to the router&rsquo;s main interface. Same for the rest of the
network path to my desktop. There, the VLAN is also configured trunked, so that
packets arrive on the host with their VLAN tag intact, allowing me to configure
a special interface on the host for just those packets. Importantly, I did not
set the desktop&rsquo;s switch port to autotag incoming packets (coming from the desktop)
with that VLAN ID. So all packets for this VLAN come into the
host tagged, and they also have to leave the host tagged.</p>
<p>Because I intended to have the lab up only while actively working on it, I didn&rsquo;t
do any config file changes, but instead wrote a small bash script to set up the
networking via <code>ip</code> commands:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span><span style="color:#75715e">#!/bin/bash
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span>
</span></span><span style="display:flex;"><span>LAN<span style="color:#f92672">=</span>eth0
</span></span><span style="display:flex;"><span>VLANID<span style="color:#f92672">=</span><span style="color:#ae81ff">512</span>
</span></span><span style="display:flex;"><span>VLAN<span style="color:#f92672">=</span>$LAN.$VLANID
</span></span><span style="display:flex;"><span>BRIDGE<span style="color:#f92672">=</span>br
</span></span><span style="display:flex;"><span>IP<span style="color:#f92672">=</span>203.0.113.1/32
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> setup_net <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>  ip link add link $LAN name $VLAN type vlan id $VLANID
</span></span><span style="display:flex;"><span>  ip link add name $BRIDGE type bridge
</span></span><span style="display:flex;"><span>  ip link set $VLAN master $BRIDGE
</span></span><span style="display:flex;"><span>  ip link set $BRIDGE up
</span></span><span style="display:flex;"><span>  ip link set $VLAN up
</span></span><span style="display:flex;"><span>  ip addr add $IP dev $BRIDGE
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">function</span> teardown_net <span style="color:#f92672">{</span>
</span></span><span style="display:flex;"><span>  ip link set $BRIDGE down
</span></span><span style="display:flex;"><span>  ip link set $VLAN down
</span></span><span style="display:flex;"><span>  ip link delete $BRIDGE
</span></span><span style="display:flex;"><span>  ip link delete $VLAN
</span></span><span style="display:flex;"><span><span style="color:#f92672">}</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">while</span> <span style="color:#f92672">[[</span> $# -gt <span style="color:#ae81ff">0</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">do</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">case</span> $1 in
</span></span><span style="display:flex;"><span>    up<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>      setup_net
</span></span><span style="display:flex;"><span>      shift
</span></span><span style="display:flex;"><span>      ;;
</span></span><span style="display:flex;"><span>    down<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>      teardown_net
</span></span><span style="display:flex;"><span>      shift
</span></span><span style="display:flex;"><span>      ;;
</span></span><span style="display:flex;"><span>    *<span style="color:#f92672">)</span>
</span></span><span style="display:flex;"><span>      echo <span style="color:#e6db74">&#34;Unknown argument&#34;</span>
</span></span><span style="display:flex;"><span>      exit <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>      ;;
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">esac</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">done</span>
</span></span></code></pre></div><p>This script creates two new network devices. The first one, called <code>eth0.512</code>,
will serve as the VLAN interface, sitting &ldquo;on top&rdquo; of <code>eth0</code>, which is my physical
NIC. The <code>PHYSICAL.VLAN</code> naming is only a convention, not a requirement.
Then there&rsquo;s the <code>br</code> bridge, which can be imagined as a &ldquo;Virtual Switch&rdquo; simulated
by the Linux kernel. Multiple interfaces can be connected to it. And through the
<code>eth0.512</code> interface being part of it, the interface connected to the bridge
would have access to the rest of the network.</p>
<p>This type of bridge is a simple type - it is not aware of the VLANs at all, so
packets send between the hosts on the bridge would not be tagged. But any packets
which go into the wider network would do so via the <code>eth0.512</code> interface, and
would consequently get tagged with the <code>512</code> VLAN ID.</p>
<p>Now, one very important fact is that the IP address needs to be assigned to the
bridge, not to the VLAN interface. I initially had it assigned to the VLAN
IF, and it did not work at all, in that the packets did not arrive on the router
from the newly VLAN 512 interface, and packets send from other hosts to the IP
assigned to the interface never arrived at all.
I&rsquo;m honestly not really able to explain why that was. Which tells
me, yet again, that at some point I need to take a tour through the Linux
kernel&rsquo;s networking stack.</p>
<h2 id="vm-setup">VM setup</h2>
<p>I had to think a lot about this part, surprisingly. My normal go-to tool for VMs
has always been LXD. I ran my VMs via it for a couple of years during the
&ldquo;one host, multiple VMs&rdquo; phase of my Homelab. Then I pulled it out again to supply
some VMs during the k8s migration. I&rsquo;m pretty comfortable with it, and I like that
it has a Terraform provider so I could put my VM configs under version control.</p>
<p>In <a href="https://blog.mei-home.net/posts/testvm-for-netbooting/">some previous desktop VM&rsquo;ing</a>,
I had opted to set up the VM directly with the <code>qemu-system</code> command. But I wanted
a little bit more structure this time, because I expect this lab to last a bit
longer.</p>
<p>These were the two extremes I was thinking about - LXD (or rather, <a href="https://linuxcontainers.org/incus/">Incus</a>),
requiring a daemon to run and some additional setup, or a bash script for
launching the VM via <code>qemu-system</code>. I was looking for something in the middle -
without a daemon, but a bit less DIY than a bash script.</p>
<p>Initially, <a href="https://developer.hashicorp.com/vagrant">Vagrant</a> looked exactly
like what I was looking for. I was a bit dismayed when I saw that it was seemingly
written in Ruby though. Nothing wrong with Ruby, but it&rsquo;s not something I have
installed on my desktop. But I went ahead and got right to writing a Vagrant
file - just to find this note on <a href="https://documentation.ubuntu.com/public-images/public-images-explanation/vagrant/#support">Ubuntu&rsquo;s Vagrant page</a>:</p>
<blockquote>
<p>Vagrant has been dropped by Ubuntu due to the adoption of the Business Source License (BSL). Following this change, Canonical will no longer publish Vagrant images directly starting with Ubuntu 24.04 LTS (Noble Numbat).</p></blockquote>
<p>So much for that idea. And I didn&rsquo;t want to run any other distro, as the entire
Homelab is based on Ubuntu, and at least for now I don&rsquo;t intend to change that.
I then looked into stuff like <a href="https://www.libvirt.org/manpages/virsh.html">virtsh</a>.
But that then turned out to also require a daemon. And at that point I decided
that Incus really was the best choice - at least I was already experienced with
it, so I could spend more time on setting up Tinkerbell and less on setting up
the lab.</p>
<p>With that decision made, I ran the Incus install:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>emerge -av incus
</span></span></code></pre></div><p>The Gentoo Wiki has a <a href="https://wiki.gentoo.org/wiki/Incus">good page on Incus</a>.
Following it, I also added my user to the required groups for being allowed to
use Incus directly:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>usermod --append --groups incus,incus-admin &lt;MYUSER&gt;
</span></span></code></pre></div><p>Then I could launch Incus like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>rc-service incus start
</span></span></code></pre></div><p>As said, I only wanted the lab to be up when I&rsquo;m actually working with it, so
I did not autostart it.</p>
<p>Finally, I initialized Incus with this command:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>incus admin init
</span></span></code></pre></div><p>I basically said &ldquo;no&rdquo; to everything, so I could set up stuff like default
networking and the default storage provider in OpenTofu later and put that config
under version control.</p>
<h2 id="setting-up-the-master-vm-with-opentofu">Setting up the Master VM with OpenTofu</h2>
<p>To configure Incus, I made use of the <a href="https://search.opentofu.org/provider/lxc/incus/v0.3.1">OpenTofu Incus provider</a>.
I didn&rsquo;t use the Incus CLI because I wanted to put the config under source control.
Even though I&rsquo;m still on <a href="https://developer.hashicorp.com/terraform">Terraform</a>
for my Homelab as a whole, I decided to go with <a href="https://opentofu.org/">OpenTofu</a>
for the lab. I intended to keep the two states, Home(prod)lab and actual lab,
separate anyway. And I saw this as a good chance to kick the tires on OpenTofu.</p>
<p>My OpenTofu <code>main.tf</code> looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">terraform</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">backend</span> <span style="color:#e6db74">&#34;local&#34;</span> {
</span></span><span style="display:flex;"><span>    path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;.terraform/terraform-main.tfstate&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">required_providers</span> {
</span></span><span style="display:flex;"><span>    incus <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>      source <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;lxc/incus&#34;</span>
</span></span><span style="display:flex;"><span>      version <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;0.3.1&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">provider</span> <span style="color:#e6db74">&#34;incus&#34;</span> {
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">remote</span> {
</span></span><span style="display:flex;"><span>    name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;local&#34;</span>
</span></span><span style="display:flex;"><span>    default <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    scheme <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;unix&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Nothing special here at all. So next, setting up some defaults for the VMs. First
step: Some storage. I just went with local storage - in the name of not overcomplicating
the lab setup unnecessarily (yes, I <em>can</em> see that smirk on your face right now):</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;incus_storage_pool&#34; &#34;local-dir&#34;</span> {
</span></span><span style="display:flex;"><span>  name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;local-dir&#34;</span>
</span></span><span style="display:flex;"><span>  description <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;Local host storage pool&#34;</span>
</span></span><span style="display:flex;"><span>  driver <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;dir&#34;</span>
</span></span><span style="display:flex;"><span>  config <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    source <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/var/lib/incus/storage-pools/local-dir&#34;</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Next comes the base profile for the VMs:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;incus_profile&#34; &#34;base&#34;</span> {
</span></span><span style="display:flex;"><span>  name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;base&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  config <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    &#34;boot.autostart&#34; <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>    &#34;cloud-init.vendor-data&#34; <span style="color:#f92672">=</span> <span style="color:#960050;background-color:#1e0010">&lt;&lt;-</span><span style="color:#66d9ef">EOT</span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">#cloud-config
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">users</span><span style="color:#960050;background-color:#1e0010">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#960050;background-color:#1e0010">-</span> <span style="color:#66d9ef">name</span><span style="color:#960050;background-color:#1e0010">:</span> <span style="color:#66d9ef">ansible</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">user</span>
</span></span><span style="display:flex;"><span>    sudo: ALL<span style="color:#f92672">=</span>(<span style="color:#66d9ef">ALL</span><span style="color:#960050;background-color:#1e0010">:</span><span style="color:#66d9ef">ALL</span>) <span style="color:#66d9ef">ALL</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">ssh_authorized_keys</span><span style="color:#960050;background-color:#1e0010">:</span>
</span></span><span style="display:flex;"><span>      - from<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;192.0.2.100&#34;</span> <span style="color:#66d9ef">ssh</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">ed25519</span> <span style="color:#66d9ef">AAAAC3NzaC1lZDI1NTE5AAAAIOaxn8l16GNyBEgYzWO0BAko9fw8kkIq9tbels3hXdUt</span> <span style="color:#66d9ef">user</span><span style="color:#960050;background-color:#1e0010">@</span><span style="color:#66d9ef">foo</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">shell</span><span style="color:#960050;background-color:#1e0010">:</span> <span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">bin</span><span style="color:#960050;background-color:#1e0010">/</span><span style="color:#66d9ef">bash</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">packages</span><span style="color:#960050;background-color:#1e0010">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#960050;background-color:#1e0010">-</span> <span style="color:#66d9ef">sudo</span>
</span></span><span style="display:flex;"><span>  <span style="color:#960050;background-color:#1e0010">-</span> <span style="color:#66d9ef">python3</span>
</span></span><span style="display:flex;"><span>  <span style="color:#960050;background-color:#1e0010">-</span> <span style="color:#66d9ef">openssh</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">server</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">chpasswd</span><span style="color:#960050;background-color:#1e0010">:</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">expire</span><span style="color:#960050;background-color:#1e0010">:</span> <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">users</span><span style="color:#960050;background-color:#1e0010">:</span>
</span></span><span style="display:flex;"><span>    <span style="color:#960050;background-color:#1e0010">-</span> <span style="color:#66d9ef">name</span><span style="color:#960050;background-color:#1e0010">:</span> <span style="color:#66d9ef">ansible</span><span style="color:#960050;background-color:#1e0010">-</span><span style="color:#66d9ef">user</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">password</span><span style="color:#960050;background-color:#1e0010">:</span> <span style="color:#66d9ef">password123</span>
</span></span><span style="display:flex;"><span>      <span style="color:#66d9ef">type</span><span style="color:#960050;background-color:#1e0010">:</span> <span style="color:#66d9ef">text</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">EOT</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">device</span> {
</span></span><span style="display:flex;"><span>    name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;network&#34;</span>
</span></span><span style="display:flex;"><span>    type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;nic&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    properties <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>      nictype <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;bridged&#34;</span>
</span></span><span style="display:flex;"><span>      parent <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;br&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Let&rsquo;s start with the network config. Here, I&rsquo;m configuring the VM to make use
of the <code>br</code> bridge I created above. The <code>bridged</code> device type will create a NIC
which is part of the given bridge device, meaning it is connected to that bridge
and will be able to use it to communicate with other connected hosts. In my
setup, this config also allows all connected devices to communicate with the
outside world.</p>
<p>Then there&rsquo;s also the <code>vendor-data</code> config. This is a <a href="https://cloud-init.io/">cloud-init</a>
configuration file. Cloud-init was introduced for Ubuntu, but has been adopted
by a number of other distributions as well. It&rsquo;s main usage is as a tool to
do initial configuration of a generic OS image. On systems supporting cloud-init,
there are generally multiple levels of e.g. systemd services running during boot.
Those can configure the network as well as create users, install packages and
set passwords and a whole host of other things. Generally, these configs are only executed once, during the
initial boot of the machine. Switching to cloud-init is one of the goals during
my Tinkerbell migration. Up to now, I&rsquo;ve been creating individual images for
each new host, which contained pretty much only the above configuration. Which
was a bit of a waste, considering that I really only needed to do some very
light customization, with the sole goal being that after first boot, the machine
would be ready for my main Ansible playbook to run.</p>
<p>This particular cloud-init config does exactly that. It installs Python and
the OpenSSH server. Surprisingly, the Incus Ubuntu images don&rsquo;t come with SSH
configured by default. Then I&rsquo;m creating the <code>ansible-user</code> user, which is the
user all of my Ansible playbooks use for connecting to the hosts in my Homelab.
The config adds the user itself, sets the shell and adds my Ansible SSH key
to the <code>authorized_keys</code>, allowing access only from my Command and Control host.
The user also has full sudo access.
Finally, I&rsquo;m setting a simple password initially, which is then changed to the
actual password during the initial Ansible playbook run. This is probably a bit
unsafe, and I plan to look into doing this better, for now it serves reasonably
well, because I need a password for sudo access even for the first playbook run.</p>
<p>I&rsquo;ve also got a small second profile, for creating hosts with disks:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;incus_profile&#34; &#34;disk-vms&#34;</span> {
</span></span><span style="display:flex;"><span>  name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;disk-vms&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">device</span> {
</span></span><span style="display:flex;"><span>    name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;root&#34;</span>
</span></span><span style="display:flex;"><span>    type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;disk&#34;</span>
</span></span><span style="display:flex;"><span>    properties <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>      pool <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${incus_storage_pool.local-dir.name}&#34;</span>
</span></span><span style="display:flex;"><span>      size <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;20GB&#34;</span>
</span></span><span style="display:flex;"><span>      path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>These two profiles are separate because I will also need to test how my diskless
netboot setup works with Tinkerbell provisioning. And honestly, I&rsquo;ve got a bad
feeling about it. But that&rsquo;s for the future. &#x1f62c;</p>
<p>The last thing to do: Actually create the VM.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-hcl" data-lang="hcl"><span style="display:flex;"><span><span style="color:#66d9ef">resource</span> <span style="color:#e6db74">&#34;incus_instance&#34; &#34;master&#34;</span> {
</span></span><span style="display:flex;"><span>  name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;master&#34;</span>
</span></span><span style="display:flex;"><span>  type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;virtual-machine&#34;</span>
</span></span><span style="display:flex;"><span>  image <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;images:ubuntu/24.04/cloud&#34;</span>
</span></span><span style="display:flex;"><span>  running <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>  profiles <span style="color:#f92672">=</span> [
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;${incus_profile.base.name}&#34;</span>
</span></span><span style="display:flex;"><span>  ]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">device</span> {
</span></span><span style="display:flex;"><span>    name <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;root&#34;</span>
</span></span><span style="display:flex;"><span>    type <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;disk&#34;</span>
</span></span><span style="display:flex;"><span>    properties <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>      pool <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;${incus_storage_pool.local-dir.name}&#34;</span>
</span></span><span style="display:flex;"><span>      size <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;50GB&#34;</span>
</span></span><span style="display:flex;"><span>      path <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  config <span style="color:#f92672">=</span> {
</span></span><span style="display:flex;"><span>    &#34;limits.cpu&#34; <span style="color:#f92672">=</span> <span style="color:#ae81ff">6</span>
</span></span><span style="display:flex;"><span>    &#34;limits.memory&#34; <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;16GB&#34;</span>
</span></span><span style="display:flex;"><span>    &#34;cloud-init.user-data&#34; <span style="color:#f92672">=</span> <span style="color:#960050;background-color:#1e0010">&lt;&lt;-</span><span style="color:#66d9ef">EOT</span><span style="color:#75715e">
</span></span></span><span style="display:flex;"><span><span style="color:#75715e">#cloud-config
</span></span></span><span style="display:flex;"><span><span style="color:#75715e"></span><span style="color:#66d9ef">hostname</span><span style="color:#960050;background-color:#1e0010">:</span> <span style="color:#66d9ef">master</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">EOT</span>
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>Nothing special about this config, it uses the previously discussed <code>base</code>
profile and adds a 50 GB disk to it. I&rsquo;ve configured it with 16GB of RAM, similar
to the Pi 5 which will ultimately host the setup.</p>
<p>A single <code>tofu apply</code> later, I had the main VM up and running, ready for the k3s
install.</p>
<h2 id="setting-up-k3s">Setting up k3s</h2>
<p>Tinkerbell is very much a Kubernetes application. Plus, I had started thinking
that standardizing on deploying everything possible in Kubernetes would be a
good thing. So regardless of whether Tinkerbell ultimately gets deployed or not,
I want a Kubernetes cluster on my cluster master host. After looking through the
current offerings, I decided on <a href="https://k3s.io/">k3s</a> as the Kubernetes distro
to use. Mostly because it seems to be the standard. While I normally instinctively
reach for the &ldquo;vanilla&rdquo; version of everything, I already know that kubeadm is
not exactly friendly to single-node deployments.</p>
<p>For the deployment on the test VM, I adapted <a href="https://github.com/k3s-io/k3s-ansible/tree/master">this Ansible role</a>.
With my adaptions, the role&rsquo;s <code>tasks/main.yml</code> looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Populate service facts</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.service_facts</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">get k3s installed version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.command</span>: <span style="color:#ae81ff">k3s --version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">register</span>: <span style="color:#ae81ff">k3s_version_output</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">changed_when</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ignore_errors</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">set k3s installed version</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">not ansible_check_mode and k3s_version_output.rc == 0</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.set_fact</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">installed_k3s_version</span>: <span style="color:#e6db74">&#34;{{ k3s_version_output.stdout_lines[0].split(&#39; &#39;)[2] }}&#34;</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Download artifact only if needed</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">not ansible_check_mode and ( k3s_version_output.rc != 0 or installed_k3s_version is version(k3s_version, &#39;&lt;&#39;) )</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">block</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Download K3s install script</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ansible.builtin.get_url</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">url</span>: <span style="color:#ae81ff">https://get.k3s.io/</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">timeout</span>: <span style="color:#ae81ff">120</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">dest</span>: <span style="color:#ae81ff">/usr/local/bin/k3s-install.sh</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#34;0755&#34;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Download K3s binary</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">ansible.builtin.command</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">cmd</span>: <span style="color:#ae81ff">/usr/local/bin/k3s-install.sh</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">INSTALL_K3S_SKIP_START</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">INSTALL_K3S_VERSION</span>: <span style="color:#e6db74">&#34;{{ k3s_version }}&#34;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">changed_when</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Make config directory</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.file</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/etc/rancher/k3s&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#34;0755&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">directory</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Copy config file</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">src</span>: <span style="color:#e6db74">&#34;k3s-config.yaml&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">dest</span>: <span style="color:#e6db74">&#34;/etc/rancher/k3s/config.yaml&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#34;0644&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">register</span>: <span style="color:#ae81ff">_server_config_result</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Make data directory</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.file</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;{{ data_dir }}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#34;0755&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">directory</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Make volume directory</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.file</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;{{ volume_dir }}&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#34;0755&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">directory</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Copy K3s service file</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.copy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">src</span>: <span style="color:#e6db74">&#34;k3s.service&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">dest</span>: <span style="color:#e6db74">&#34;/etc/systemd/system/k3s.service&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">owner</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">group</span>: <span style="color:#ae81ff">root</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">mode</span>: <span style="color:#e6db74">&#34;0644&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">register</span>: <span style="color:#ae81ff">service_file_single</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Restart K3s service</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ansible_facts.services[&#39;k3s.service&#39;] is defined</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ansible_facts.services[&#39;k3s.service&#39;].state == &#39;running&#39;</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">service_file_single.changed or _server_config_result.changed</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.systemd</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">k3s</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">daemon_reload</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">restarted</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Enable and check K3s service</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">when</span>: <span style="color:#ae81ff">ansible_facts.services[&#39;k3s.service&#39;] is not defined or ansible_facts.services[&#39;k3s.service&#39;].state != &#39;running&#39;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.systemd</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">name</span>: <span style="color:#ae81ff">k3s</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">daemon_reload</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">state</span>: <span style="color:#ae81ff">started</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">enabled</span>: <span style="color:#66d9ef">true</span>
</span></span></code></pre></div><p>The nice thing about this role is that it can handle updates reasonably well.
It still feels a bit weird to use a bash script as part of the process, but it
looks like that&rsquo;s really the intended approach for deploying k3s. Worth noting
here is the very first task:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">Populate service facts</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">ansible.builtin.service_facts</span>:
</span></span></code></pre></div><p>Without this, at least in my setup, later tasks using <code>ansible_facts.services</code>
checks do not work, as Ansible does not gather service data by default.</p>
<p>The role also needs some variables defined, which I do in <code>defaults/main.yml</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">k3s_version</span>: <span style="color:#ae81ff">v1.33.1+k3s1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data_dir</span>: <span style="color:#e6db74">&#34;/srv/k3s/state&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">volume_dir</span>: <span style="color:#e6db74">&#34;/srv/k3s/volumes&#34;</span>
</span></span></code></pre></div><p>The <code>k3s.service</code> file is also taken from the role:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-systemd" data-lang="systemd"><span style="display:flex;"><span><span style="color:#66d9ef">[Unit]</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">Description</span><span style="color:#f92672">=</span><span style="color:#e6db74">Lightweight Kubernetes</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">Documentation</span><span style="color:#f92672">=</span><span style="color:#e6db74">https://k3s.io</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">Wants</span><span style="color:#f92672">=</span><span style="color:#e6db74">network-online.target</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">After</span><span style="color:#f92672">=</span><span style="color:#e6db74">network-online.target</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">[Install]</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">WantedBy</span><span style="color:#f92672">=</span><span style="color:#e6db74">multi-user.target</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">[Service]</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">Type</span><span style="color:#f92672">=</span><span style="color:#e6db74">notify</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">EnvironmentFile</span><span style="color:#f92672">=</span><span style="color:#e6db74">-/etc/default/%N</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">EnvironmentFile</span><span style="color:#f92672">=</span><span style="color:#e6db74">-/etc/sysconfig/%N</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">EnvironmentFile</span><span style="color:#f92672">=</span><span style="color:#e6db74">-/etc/systemd/system/k3s.service.env</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">KillMode</span><span style="color:#f92672">=</span><span style="color:#e6db74">process</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">Delegate</span><span style="color:#f92672">=</span><span style="color:#e6db74">yes</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Having non-zero Limit*s causes performance problems due to accounting overhead</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># in the kernel. We recommend using cgroups to do container-local accounting.</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">LimitNOFILE</span><span style="color:#f92672">=</span><span style="color:#e6db74">1048576</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">LimitNPROC</span><span style="color:#f92672">=</span><span style="color:#e6db74">infinity</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">LimitCORE</span><span style="color:#f92672">=</span><span style="color:#e6db74">infinity</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">TasksMax</span><span style="color:#f92672">=</span><span style="color:#e6db74">infinity</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">TimeoutStartSec</span><span style="color:#f92672">=</span><span style="color:#e6db74">0</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">Restart</span><span style="color:#f92672">=</span><span style="color:#e6db74">always</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">RestartSec</span><span style="color:#f92672">=</span><span style="color:#e6db74">5s</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">ExecStartPre</span><span style="color:#f92672">=</span><span style="color:#e6db74">-/sbin/modprobe br_netfilter</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">ExecStartPre</span><span style="color:#f92672">=</span><span style="color:#e6db74">-/sbin/modprobe overlay</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">ExecStart</span><span style="color:#f92672">=</span><span style="color:#e6db74">/usr/local/bin/k3s server</span>
</span></span></code></pre></div><p>And then finally, there&rsquo;s the k3s config file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">tls-san</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;k3s.example.com&#34;</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#e6db74">&#34;192.0.2.100&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">data-dir</span>: <span style="color:#e6db74">&#34;{{ data_dir }}&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">cluster-cidr</span>: <span style="color:#e6db74">&#34;10.42.0.0/16&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">service-cidr</span>: <span style="color:#e6db74">&#34;10.43.0.0/16&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">flannel-backend</span>: <span style="color:#e6db74">&#34;wireguard-native&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">default-local-storage-path</span>: <span style="color:#e6db74">&#34;{{ volume_dir }}&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">disable</span>: <span style="color:#e6db74">&#34;servicelb&#34;</span>
</span></span></code></pre></div><p>Nothing too special here either. I decided to keep k3s&rsquo; default <a href="https://github.com/rancher/local-path-provisioner/tree/master">local-storage provider</a>.
The reason being that I need this cluster to be as independent of any other
services as possible, because it&rsquo;s going to be the place where I deploy everything
that&rsquo;s serving as the bedrock for the rest of the Homelab.</p>
<p>Besides that, the last notable action is disabling the <code>servivelb</code> load balancer
service. In short, this is k3s&rsquo; implementation of a simple handler for LoadBalancer
type k8s Services. I couldn&rsquo;t use it because DHCP packets never made
it to the Tinkerbell Pod. I will go into more detail about this in the next
post of the series.</p>
<p>And after an <code>ansible-playbook deployment.yml --limit master</code>, I had a fully
functional k3s cluster. It started up without any issue, deployed Traefik and
was ready for more workloads. I like how little hassle this was, and I find myself
agreeing with k3s&rsquo; claims of being a simple k3s distribution. As far as such
things can be simple. &#x1f60f;</p>
<h2 id="cluster-connection-setups">Cluster connection setups</h2>
<p>Before I finish this post, I would like to talk a little bit about how I
configured access to the new k3s cluster, as it would be accessed from the same host
as my main cluster. I ended up going with the alias route, using kubectl&rsquo;s
<code>--context</code> parameter.</p>
<p>Let&rsquo;s first have a look at the updated <code>~/.kube/config</code> file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">clusters</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">cluster</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">certificate-authority-data</span>: <span style="color:#ae81ff">&lt;BASE64 encoded data here&gt;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">server</span>: <span style="color:#ae81ff">https://k8s.example.com:6443</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">main-cluster</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">cluster</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">server</span>: <span style="color:#ae81ff">https://k3s.example.com:6443</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">certificate-authority-data</span>: <span style="color:#ae81ff">&lt;Different BASE64 encoded data here&gt;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">management-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">contexts</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">context</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cluster</span>: <span style="color:#ae81ff">main-cluster</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">user</span>: <span style="color:#ae81ff">main-admin</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">main-admin@main-cluster</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">context</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cluster</span>: <span style="color:#ae81ff">management-cluster</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">user</span>: <span style="color:#ae81ff">mgm-admin</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mgm-admin@management-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">current-context</span>: <span style="color:#ae81ff">main-admin@main-cluster</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Config</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">preferences</span>: {}
</span></span><span style="display:flex;"><span><span style="color:#f92672">users</span>:
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">main-admin</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">user</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">exec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">client.authentication.k8s.io/v1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">pass</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">show</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">main-creds</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">interactiveMode</span>: <span style="color:#ae81ff">IfAvailable</span>
</span></span><span style="display:flex;"><span>- <span style="color:#f92672">name</span>: <span style="color:#ae81ff">mgm-admin</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">user</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">exec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">client.authentication.k8s.io/v1</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: <span style="color:#ae81ff">pass</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">args</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">show</span>
</span></span><span style="display:flex;"><span>        - <span style="color:#ae81ff">mgm-creds</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">interactiveMode</span>: <span style="color:#ae81ff">IfAvailable</span>
</span></span></code></pre></div><p>For more details on this config, and why <a href="https://www.passwordstore.org/">pass</a>
appears in it, have a look at <a href="https://blog.mei-home.net/posts/securing-k8s-credentials/">this post</a>.
Each cluster gets its own context definition, and each cluster has a different user.</p>
<p>The aliases for kubectl then look like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>alias k<span style="color:#f92672">=</span>kubectl<span style="color:#ae81ff">\ </span>--context<span style="color:#f92672">=</span>main-admin@main-cluster
</span></span><span style="display:flex;"><span>alias k-master<span style="color:#f92672">=</span>kubectl<span style="color:#ae81ff">\ </span>--context<span style="color:#f92672">=</span>mgm-admin@management-cluster
</span></span></code></pre></div><p>So with <code>k</code>, I&rsquo;m getting my main cluster. I decided to keep the alias I had
originally created for the cluster, instead of renaming it to e.g. <code>k-main</code>. I&rsquo;ve
started to question this decision and would propose for anyone looking to replicate
my setup to not re-use an old setup like this, as inevitably, you will be using
the main cluster&rsquo;s alias even though you meant to talk to the management cluster.</p>
<p>Using <code>k</code> when wanting to do something with the k8s cluster has become pretty
ingrained over the last year+.</p>
<p>One random comment for when you&rsquo;re using a similar setup with autocompletion:
Don&rsquo;t surround the alias definition with quotation marks, e.g. like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>alias k<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;kubectl --context=main-admin@main-cluster&#34;</span>
</span></span></code></pre></div><p>The alias itself will work, but autocomplete won&rsquo;t. That&rsquo;s why I&rsquo;m using the <code>\ </code>
syntax instead. Apropos autocomplete, you need to explicitly tell bash to
autocomplete on aliases. For example like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-shell" data-lang="shell"><span style="display:flex;"><span>source ~/.kube/kubectl-comp
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#f92672">[[</span> <span style="color:#66d9ef">$(</span>type -t compopt<span style="color:#66d9ef">)</span> <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;builtin&#34;</span> <span style="color:#f92672">]]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    complete -o default -F __start_kubectl k kmaster
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">else</span>
</span></span><span style="display:flex;"><span>    complete -o default -o nospace -F __start_kubectl k kmaster
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span></code></pre></div><p>The <code>__start_kubectl</code> function is defined in the autocomplete script provided
by kubectl when running <code>kubectl completion bash</code>.</p>
<p>Finally, I wrote about how I&rsquo;m using Helmfile to manage the deployments on my
Kubernetes cluster in the <a href="https://blog.mei-home.net/posts/helmfile/">last post</a>.
Luckily, Helmfile already has an option to set the context right in the Helmfile:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">helmDefaults</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">kubeContext</span>: <span style="color:#ae81ff">main-admin@main-cluster</span>
</span></span></code></pre></div><p>This removes any danger of deploying to the wrong cluster, although commands
like <code>destroy</code> might still be dangerous when I&rsquo;ve got entries with the same names
in both files. &#x1f62c;</p>
<h2 id="finale">Finale</h2>
<p>And that completes this part of the setup. The next one will be about the setup
of Tinkerbell itself, and I will likely combine it with the provisioning of the
first VM with Tinkerbell.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Tinkerbell Part I: The Plan</title>
      <link>https://blog.mei-home.net/posts/tinkerbell-1-plan/</link>
      <pubDate>Thu, 29 May 2025 12:00:13 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/tinkerbell-1-plan/</guid>
      <description>Into to the Tinkerbell deployment</description>
      <content:encoded><![CDATA[<p>A rough overview of my plan for trialing tinkerbell in my Homelab.</p>
<p>This is part 1 of my <a href="https://blog.mei-home.net/tags/series-tinkerbell/">tinkerbell series</a>.</p>
<p>I&rsquo;m planning to trial <a href="https://tinkerbell.org/">tinkerbell</a> in my Homelab to
improve my baremetal provisioning setup. This first post will be the plan and
the reason why I&rsquo;m doing this.</p>
<p>Tinkerbell is a system for provisioning baremetal machines. It is deployed into
a Kubernetes cluster and consists of a controller, a DHCP/netboot server, a
metadata provider e.g. for cloud-init data, and an in-memory OS for running
workflows. The basic idea is that new machines netboot into that in-memory OS
and execute workflows configured in tinkerbell to install the actual OS.</p>
<h2 id="the-current-provisioning-setup">The current provisioning setup</h2>
<p>Before going into detail on the plan for the future, let&rsquo;s have a look at what
my provisioning pipeline currently looks like.</p>
<p>The first step of any setup is to create an individual disk image for the new machine.
I&rsquo;ve standardized on Ubuntu server for all of my Homelab hosts, as it supports
Raspberry Pis well and thus allows me to run the same Linux distro on the entire
Homelab. The image generation varies a bit between Pis and x86 hosts. But both
use HashiCorp&rsquo;s <a href="https://developer.hashicorp.com/packer">Packer</a> to create
an image, followed by a short Ansible playbook which prepares the image for
further provisioning with my main Ansible playbook.</p>
<p>For my Pis, this preparation is done in a chroot with qemu-arm-static, based
on Ubuntu&rsquo;s preinstalled Pi images. For x86 hosts, a normal Ubuntu install is
run in a Qemu VM. Once the image is prepared, I stick a USB drive into the new
host and <code>dd</code> the image onto the disk, either a local disk or a Ceph RBD, depending
on whether it&rsquo;s a diskless host or not.</p>
<p>And this, overall, seems rather unnecessarily complicated and manual. First of
all, the short Ansible playbook I run to prepare the image for further provisioning
only does the following:</p>
<ul>
<li>Installs a couple of packages Ansible needs to run, e.g. Python</li>
<li>Adds my standard Homelab Ansible user, sets up <code>sudo</code> and deploys the SSH key</li>
<li>Sets the hostname</li>
</ul>
<p>For netbooting hosts, it does a few more things:</p>
<ul>
<li>Sets the boot partition to point to the correct NFS mount</li>
<li>Sets the kernel command line to mount the right RBD</li>
</ul>
<p>Most of these steps could be done via <a href="https://cloud-init.io/">cloud-init</a>, removing
the need to generate individual images per host entirely. This is one big goal
of the tinkerbell introduction: Getting rid of per-host images and ending up with
only two base images, one for Pis and one for x86 hosts.</p>
<p>In addition, I&rsquo;m hoping that tinkerbell&rsquo;s workflows allow me to also automate the
image install, so I can also get rid of the need to boot from a USB stick and do
it manually.</p>
<h2 id="the-plan">The plan</h2>
<p>I recently bought a couple of Raspberry Pi 5 to replace my Kubernetes control
plane nodes. When I did so, I ordered one additional Pi, with 16 GB of RAM and
a 1 TB SSD. That Pi will soon replace what I call my &ldquo;Cluster Master&rdquo;. It&rsquo;s a
host explicitly intended to be bootable and run its services without any external
dependencies. It, in turn, then hosts foundational services for the rest of the Homelab.
That machine will host a new Kubernetes cluster for tinkerbell.</p>
<p>But I will not jump right into that setup. Instead, I plan to first make a setup
on my desktop with a couple of VMs to kick the tires on tinkerbell, because there
are a couple of open questions:</p>
<ol>
<li>How exactly does the DHCP server behave? Does it run in proxy mode? Does it
have to be the only DHCP server in the subnet?</li>
<li>How does tinkerbell work in general?</li>
<li>Can I make tinkerbell work with Pi 4? What about Pi 5?</li>
<li>Can I make tinkerbell work with my netboot setup?</li>
</ol>
<p>All of these will be answered in the experimental phase. The general answer on
question 3), at least for Pi 4, seems to be &ldquo;Eh, possibly&rdquo;. This is also the
biggest stumbling block I see. As I noted above, tinkerbell runs an in-memory OS
to execute its workflow for installing the main OS. So the main challenge will be
to get the Pis booted into that OS. But then again, the Pi netboot can already
boot into a given kernel and initramfs. So unless tinkerbell somehow has a hard
requirement on iPXE boot, I should be able to somehow get it to work on the Pis.
I expect this to be the most fun part of the entire endeavor. &#x1f913;</p>
<p>For this experimentation phase, I intend to set up a lab environment on my desktop.
I decided to do this for two reasons:</p>
<ol>
<li>I need to isolate it from the Homelab for now, due to tinkerbell running a DHCP
server</li>
<li>My past work on netboot has shown that doing the experimentation on a VM you
can easily interact with has a huge advantage</li>
</ol>
<p>I actually thought a lot about how to manage the VMs for this setup on my desktop.
I got burned pretty hard by VirtualBox in the past, so that was out. The last
time I set up a VM lab on my desktop, I used Qemu directly, with a bit of bash
scripting around it. See <a href="https://blog.mei-home.net/posts/testvm-for-netbooting/">this post</a>
if you&rsquo;re interested. What I was looking for this time was something in between
&ldquo;needs a daemon running&rdquo; and &ldquo;big ball of bash&rdquo;.
I looked at <a href="https://developer.hashicorp.com/vagrant">HashiCorp&rsquo;s Vagrant</a> at
first, and will give it a try with the <a href="https://github.com/ppggff/vagrant-qemu">QEMU provider</a>.
If that does not work out for some reason, I will instead use <a href="https://linuxcontainers.org/incus/">Incus</a>.
It&rsquo;s a bit more than I really want to set up on my desktop, but on the other hand
I&rsquo;m pretty familiar with LXD VMs.
The big advantage of Vagrant is that there&rsquo;s no daemon running, and I get
version-controllable configs out of the box. For Incus, I&rsquo;d also set up OpenTofu,
so I could put the config under version control, instead of ending up with a docs
page listing the CLI commands to execute in order to set it all up.</p>
<p>Once that&rsquo;s done, I will have to set up a Kubernetes cluster on the VM to install
tinkerbell into. I&rsquo;m currently planning to use k3s, as it seems to be the default
choice for single node clusters.</p>
<p>This setup will happen regardless of whether I ultimately deploy tinkerbell or not.
My main reason is that I&rsquo;d like to just standardize as much as possible on
deploying everything with Kubernetes, even outside the main cluster. This will
also entail looking to deploy the apps currently running baremetal on my Master.
The main one is DNSmasq, providing a TFTP boot server for my diskless hosts.
But I also have further plans for a &ldquo;management&rdquo; style Kubernetes cluster. Namely,
I also want to try out GitOps for my Kubernetes cluster, for example with ArgoCD.
That also calls for a separate cluster setup. And finally, I would also like to
trial cluster API, just for the fun of it.</p>
<p>For the Kubernetes distribution I settled on k3s. It&rsquo;s supposed to be relatively
lightweight, and it seems to run quite nicely in a single node setup from what
I&rsquo;ve read.</p>
<p>So overall, the plan entails the following steps:</p>
<ol>
<li>Create a new VLAN to properly isolate the tinkerbell experiment, and specifically
its DHCP server</li>
<li>Set up a VM with Vagrant or Incus on my desktop for experimentation</li>
<li>Create a k3s single-node cluster on the VM</li>
<li>Install tinkerbell in the cluster</li>
<li>Kick the tires for provisioning a second VM</li>
<li>Try to get provisioning working on a Pi 4 and a Pi 5</li>
<li>If everything works, deploy it in the Homelab</li>
</ol>
<p>And that&rsquo;s it already on the planning front. This is a lot more experimental
than my Kubernetes migration was, so there&rsquo;s not that much to plan up front. I
didn&rsquo;t need a single flow chart. &#x1f601;</p>
<p>Next will be a post on the lab setup on my desktop, once I&rsquo;ve got that running.</p>
]]></content:encoded>
    </item>
  </channel>
</rss>
