Tinkerbell Part I: The Plan

A rough overview of my plan for trialing tinkerbell in my Homelab.

I’m planning to trial tinkerbell in my Homelab to improve my baremetal provisioning setup. This first post will be the plan and the reason why I’m doing this.

Tinkerbell is a system for provisioning baremetal machines. It is deployed into a Kubernetes cluster and consists of a controller, a DHCP/netboot server, a metadata provider e.g. for cloud-init data, and an in-memory OS for running workflows. The basic idea is that new machines netboot into that in-memory OS and execute workflows configured in tinkerbell to install the actual OS.

The current provisioning setup

Before going into detail on the plan for the future, let’s have a look at what my provisioning pipeline currently looks like.

The first step of any setup is to create an individual disk image for the new machine. I’ve standardized on Ubuntu server for all of my Homelab hosts, as it supports Raspberry Pis well and thus allows me to run the same Linux distro on the entire Homelab. The image generation varies a bit between Pis and x86 hosts. But both use HashiCorp’s Packer to create an image, followed by a short Ansible playbook which prepares the image for further provisioning with my main Ansible playbook.

For my Pis, this preparation is done in a chroot with qemu-arm-static, based on Ubuntu’s preinstalled Pi images. For x86 hosts, a normal Ubuntu install is run in a Qemu VM. Once the image is prepared, I stick a USB drive into the new host and dd the image onto the disk, either a local disk or a Ceph RBD, depending on whether it’s a diskless host or not.

And this, overall, seems rather unnecessarily complicated and manual. First of all, the short Ansible playbook I run to prepare the image for further provisioning only does the following:

Installs a couple of packages Ansible needs to run, e.g. Python
Adds my standard Homelab Ansible user, sets up sudo and deploys the SSH key
Sets the hostname

For netbooting hosts, it does a few more things:

Sets the boot partition to point to the correct NFS mount
Sets the kernel command line to mount the right RBD

Most of these steps could be done via cloud-init, removing the need to generate individual images per host entirely. This is one big goal of the tinkerbell introduction: Getting rid of per-host images and ending up with only two base images, one for Pis and one for x86 hosts.

In addition, I’m hoping that tinkerbell’s workflows allow me to also automate the image install, so I can also get rid of the need to boot from a USB stick and do it manually.

The plan

I recently bought a couple of Raspberry Pi 5 to replace my Kubernetes control plane nodes. When I did so, I ordered one additional Pi, with 16 GB of RAM and a 1 TB SSD. That Pi will soon replace what I call my “Cluster Master”. It’s a host explicitly intended to be bootable and run its services without any external dependencies. It, in turn, then hosts foundational services for the rest of the Homelab. That machine will host a new Kubernetes cluster for tinkerbell.

But I will not jump right into that setup. Instead, I plan to first make a setup on my desktop with a couple of VMs to kick the tires on tinkerbell, because there are a couple of open questions:

How exactly does the DHCP server behave? Does it run in proxy mode? Does it have to be the only DHCP server in the subnet?
How does tinkerbell work in general?
Can I make tinkerbell work with Pi 4? What about Pi 5?
Can I make tinkerbell work with my netboot setup?

All of these will be answered in the experimental phase. The general answer on question 3), at least for Pi 4, seems to be “Eh, possibly”. This is also the biggest stumbling block I see. As I noted above, tinkerbell runs an in-memory OS to execute its workflow for installing the main OS. So the main challenge will be to get the Pis booted into that OS. But then again, the Pi netboot can already boot into a given kernel and initramfs. So unless tinkerbell somehow has a hard requirement on iPXE boot, I should be able to somehow get it to work on the Pis. I expect this to be the most fun part of the entire endeavor. 🤓

For this experimentation phase, I intend to set up a lab environment on my desktop. I decided to do this for two reasons:

I need to isolate it from the Homelab for now, due to tinkerbell running a DHCP server
My past work on netboot has shown that doing the experimentation on a VM you can easily interact with has a huge advantage

I actually thought a lot about how to manage the VMs for this setup on my desktop. I got burned pretty hard by VirtualBox in the past, so that was out. The last time I set up a VM lab on my desktop, I used Qemu directly, with a bit of bash scripting around it. See this post if you’re interested. What I was looking for this time was something in between “needs a daemon running” and “big ball of bash”. I looked at HashiCorp’s Vagrant at first, and will give it a try with the QEMU provider. If that does not work out for some reason, I will instead use Incus. It’s a bit more than I really want to set up on my desktop, but on the other hand I’m pretty familiar with LXD VMs. The big advantage of Vagrant is that there’s no daemon running, and I get version-controllable configs out of the box. For Incus, I’d also set up OpenTofu, so I could put the config under version control, instead of ending up with a docs page listing the CLI commands to execute in order to set it all up.

Once that’s done, I will have to set up a Kubernetes cluster on the VM to install tinkerbell into. I’m currently planning to use k3s, as it seems to be the default choice for single node clusters.

This setup will happen regardless of whether I ultimately deploy tinkerbell or not. My main reason is that I’d like to just standardize as much as possible on deploying everything with Kubernetes, even outside the main cluster. This will also entail looking to deploy the apps currently running baremetal on my Master. The main one is DNSmasq, providing a TFTP boot server for my diskless hosts. But I also have further plans for a “management” style Kubernetes cluster. Namely, I also want to try out GitOps for my Kubernetes cluster, for example with ArgoCD. That also calls for a separate cluster setup. And finally, I would also like to trial cluster API, just for the fun of it.

For the Kubernetes distribution I settled on k3s. It’s supposed to be relatively lightweight, and it seems to run quite nicely in a single node setup from what I’ve read.

So overall, the plan entails the following steps:

Create a new VLAN to properly isolate the tinkerbell experiment, and specifically its DHCP server
Set up a VM with Vagrant or Incus on my desktop for experimentation
Create a k3s single-node cluster on the VM
Install tinkerbell in the cluster
Kick the tires for provisioning a second VM
Try to get provisioning working on a Pi 4 and a Pi 5
If everything works, deploy it in the Homelab

And that’s it already on the planning front. This is a lot more experimental than my Kubernetes migration was, so there’s not that much to plan up front. I didn’t need a single flow chart. 😁

Next will be a post on the lab setup on my desktop, once I’ve got that running.

The current provisioning setup#

The plan#

The current provisioning setup

The plan