In this post, I will show how I provisioned a Raspberry Pi 4 with an attached USB SSD via Tinkerbell.

This is part 4 of my Tinkerbell series.

The main goal of this post is to get this little guy to boot into Tinkerbell’s HookOS and install an Ubuntu 24.04 Raspberry Pi image onto the SSD:

A picture of a desk with a Raspberry Pi 4 board and accessories. The Pi 4 is clad in a passive red heat sink and mounted on a right-angle piece of metal. It's connected to a small 7 inch screen with an HDMI and an USB cable. Furthermore, it's also connected to a keyboard and has a network cable plugged in. Finally, it's also connected to a 2.5 inch Kingston SATA SSD via a USB-to-SATA adapter.

My experimental setup.

To get the Ubuntu image onto the SSD and have the Pi boot from it, the following steps need to be executed:

  1. Boot the Pi into an UEFI firmware via the Pi’s weird PXE boot procedure
  2. From the UEFI firmware, boot into iPXE, again via PXE boot
  3. Fetch the iPXE script to execute HookOS from Tinkerbell. Again, you guessed it, via PXE
  4. Finally, boot HookOS itself

Raspberry Pi PXE boot to UEFI

To understand the rest of this post, let’s start with a quick look at the Raspberry Pi’s netboot process. It all starts with a DHCP request. The direct reply to that request might already contain a TFTP server address. If it doesn’t, the Pi’s firmware will also wait for a Proxy DHCP reply. With this configuration, it’s possible to split the normal DHCP server doing IP address management and the DHCP server which supplies PXE boot parameters.

When a TFTP server address is indeed received, the Pi starts to download files from it. The boot file option that can also be supplied for PXE is not supported by the Pi netboot process. It doesn’t matter what that option is set to, and whether it’s send in the DHCP reply or not. It’s just ignored. The initial file being downloaded is the config.txt file. It contains configuration for the firmware. Relevant to the boot process are the options for the kernel and initramfs as well as, in this particular case, the armstub option. The armstub tells the boot firmware - which runs on the GPU, on this SoC - what to load up on the ARM CPU cores after the initial boot. By default, that’s just looking to load the kernel and initramfs during a normal boot, leading to Linux being started. But when the armstub is set, the given file is loaded instead. In all three cases, the files given are loaded from the TFTP server when netbooting.

To load the Pi in UEFI mode, I’ve been using this repository. Initially, I thought I needed this “special” firmware to get iPXE running, but it turns out that the iPXE project already provides the snp.efi file, which is compatible with a Pi 4 booted into UEFI.

So for now, the goal is to get the Pi booted into the UEFI stub.

Dnsmasq server setup

To supply all of these files via TFTP, I needed a TFTP server. While Tinkerbell does provide TFTP capabilities, those are very rudimentary and only intended to provide the iPXE binary for PXE booting hosts, and nothing more.

As I’ve already got a Dnsmasq instance running in my Homelab, for my regular netbooters, I decided to use it here as well. And that was quite a ride in and of itself, because of the way Kubernetes networking and DHCP work.

I set Dnsmasq up on my k3s test cluster running on a VM. I could not make the Pod use host networking, because Tinkerbell, which also needed to listen on port 67 for DHCP, was already running on the same host. So I decided to use the same trick that Tinkerbell uses, a macvlan type interface. This type of Linux interface is attached to a real physical interface, but gets a different MAC, so it’s basically a completely separate interface. The rest of the network just knows that there’s now two MAC addresses behind the given switch port instead of just one. Tinkerbell has the same approach, see here.

This script creates an additional interface, which piggy-backs off of the physical interface to make it possible for a Pod to receive and send broadcast packets. With just the VIP created by kube-vip for LoadBalancer services, broadcast packets are just not forwarded to the Pod, and Dnsmasq never sees them. This is problematic, as the initial DHCP discover packets are send as broadcast, as the host hasn’t been configured yet and doesn’t know about the DHCP server in the subnet.

After configuring the macvlan interface, I tried this Dnsmasq configuration:

port=0
dhcp-range=203.0.113.255,proxy
log-dhcp
enable-tftp
tftp-root=/tftp-files
pxe-service=0,"Raspberry Pi Boot",203.0.113.17

Together with the manually created macvlan interface, Dnsmasq was able to receive the broadcast packets - but it wasn’t able to answer them. Instead, I got this line in the logs:

dnsmasq-dhcp[18135]: no address range available for DHCP request via macvlandnsm

After some digging, I figured out that the issue was that Dnsmasq uses the subnet of the interface where a DHCP request arrives to determine which dhcp-range parameter to use for the answer. And in this case, the macvlandnsm interface gets the hardcoded 127.1.1.1 IP in the script. So I changed the dhcp-range parameter like this:

dhcp-range=127.1.1.255,proxy

And this “worked”:

dnsmasq[2837]: started, version 2.91 DNS disabled
dnsmasq[2837]: compile time options: IPv6 GNU-getopt no-DBus no-UBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset no-nftset auth no-DNSSEC loop-detect inotify dumpfile
dnsmasq-dhcp[2837]: DHCP, proxy on subnet 127.1.1.255
dnsmasq-tftp[2837]: TFTP root is /tftp-files
dnsmasq-dhcp[2837]: 2783272004 available DHCP subnet: 127.1.1.255/255.0.0.0
dnsmasq-dhcp[2837]: 2783272004 vendor class: PXEClient:Arch:00000:UNDI:002001
dnsmasq-dhcp[2837]: 2783272004 PXE(macvlandnsm) e4:5f:01:bc:f4:ce proxy
dnsmasq-dhcp[2837]: 2783272004 tags: macvlandnsm
dnsmasq-dhcp[2837]: 2783272004 broadcast response
dnsmasq-dhcp[2837]: 2783272004 sent size:  1 option: 53 message-type  2
dnsmasq-dhcp[2837]: 2783272004 sent size:  4 option: 54 server-identifier  127.1.1.1
dnsmasq-dhcp[2837]: 2783272004 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
dnsmasq-dhcp[2837]: 2783272004 sent size: 17 option: 97 client-machine-id  00:34:69:50:52:15:31:c0:00:01:bc:f4:ce:cb...
dnsmasq-dhcp[2837]: 2783272004 sent size: 41 option: 43 vendor-encap  06:01:03:0a:04:00:50:58:45:08:07:80:00:01...
dnsmasq-dhcp[2837]: 2783272004 available DHCP subnet: 127.1.1.255/255.0.0.0
dnsmasq-dhcp[2837]: 2783272004 vendor class: PXEClient:Arch:00000:UNDI:002001

So Dnsmasq did receive the DHCP request, and it also answered to it. But have a closer look at this line:

dnsmasq-dhcp[2837]: 2783272004 sent size:  4 option: 54 server-identifier  127.1.1.1

Note the 127.1.1.1 IP returned by Dnsmasq to the netbooting Pi. That’s what the Pi uses as the TFTP server. And of course, that address is from the loopback range, and hence isn’t accessible for the Pi at all.

After some additional tinkering and testing, I came up with the solution to just assign the macvlandnsm interface a routable IP, and also assigned the IP as /24 instead of /32. Then I reset the dhcp-range option to contain the actual subnet:

dhcp-range=203.0.113.255,proxy

With these changes, the Pi was then able to boot into UEFI:

dnsmasq-dhcp[20493]: 2783272890 available DHCP subnet: 203.0.113.255/255.255.255.0
dnsmasq-dhcp[20493]: 2783272890 vendor class: PXEClient:Arch:00000:UNDI:002001
dnsmasq-dhcp[20493]: 2783272890 PXE(macvlandnsm) e4:5f:01:bc:f4:ce proxy
dnsmasq-dhcp[20493]: 2783272890 tags: macvlandnsm
dnsmasq-dhcp[20493]: 2783272890 broadcast response
dnsmasq-dhcp[20493]: 2783272890 sent size:  1 option: 53 message-type  2
dnsmasq-dhcp[20493]: 2783272890 sent size:  4 option: 54 server-identifier  203.0.113.18
dnsmasq-dhcp[20493]: 2783272890 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
dnsmasq-dhcp[20493]: 2783272890 sent size: 17 option: 97 client-machine-id  00:34:69:50:52:15:31:c0:00:01:bc:f4:ce:cb...
dnsmasq-dhcp[20493]: 2783272890 sent size: 41 option: 43 vendor-encap  06:01:03:0a:04:00:50:58:45:08:07:80:00:01...
dnsmasq-dhcp[20493]: 2783272890 available DHCP subnet: 203.0.113.255/255.255.255.0
dnsmasq-dhcp[20493]: 2783272890 vendor class: PXEClient:Arch:00000:UNDI:002001
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/start4.elf to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/fixup4.dat to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/bcm2711-rpi-4-b.dtb to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/config.txt to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/overlays/miniuart-bt.dtbo to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/overlays/upstream-pi4.dtbo to 203.0.113.70
dnsmasq-tftp[20493]: sent /tftp-files/RPI_EFI.fd to 203.0.113.70

I have removed a number of lines from the log output where the Pi aborted the transmission. This approach is used to check whether a certain file is present on the TFTP server to decide what to download next.

To provide the files in the /tftp-files directory in the Dnsmasq Pod, I used this release. I took the rpi4-uefi-ipxe.zip file and unpacked it all in the /tftp-files dir, to which I had mounted a PersistentVolume.

I’ve also simplified Tinkerbell’s manual interface setup script a bit to use it with Dnsmasq. It now looks like this:

#!/usr/bin/env sh

# Script taken from Tinkerbell: https://raw.githubusercontent.com/tinkerbell/tinkerbell/refs/heads/main/helm/tinkerbell/templates/host-interface-config-map.yaml

# This script allows us to listen and respond to DHCP requests on a host network interface and interact with Dnsmasq.

set -xeuo pipefail

function usage() {
    echo "Usage: $0 [OPTION]..."
    echo "Init script for setting up a network interface to listen and respond to DHCP requests from the Host and move it into a container."
    echo
    echo "Options:"
    echo "  -s, --src     Source interface for listening and responding to DHCP requests (default: default gateway interface)"
    echo "  -t, --type    Create the interface of type, must be either ipvlan or macvlan (default: macvlan)"
    echo "  -c, --clean   Clean up any interfaces created"
    echo "  -h, --help    Display this help and exit"
}

function binary_exists() {
    command -v "$1" >/dev/null 2>&1
}

function main() {
    local src_interface="$1"
    local interface_type="$2"
    local interface_mode="$3"
    local interface_name="macvlandnsm"

    # Preparation
    # Delete existing interfaces in the container
    ip link del ${interface_name} || true
    # Delete existing interfaces in the host namespace
    nsenter -t1 -n ip link del ${interface_name} || true
    # Create the interface
    echo  "Creating interface ${interface_name} of type ${interface_type} with mode ${interface_mode} linked to ${src_interface}"
    nsenter -t1 -n ip link add "${interface_name}" link "${src_interface}" type "${interface_type}" mode "${interface_mode}" || true
    # Move the interface into the Pod container
    pid=$(echo $$)
    echo "Moving interface ${interface_name} into container with PID ${pid}"
    nsenter -t1 -n ip link set "${interface_name}" netns ${pid} || nsenter -t1 -n ip link delete "${interface_name}"
    # Bring up the interface
    ip link set dev "${interface_name}" up
    # Set the IP address
    ip addr add 203.0.113.18/24 dev "${interface_name}" noprefixroute || true
}

src_interface=""
interface_type="macvlan"
interface_mode="bridge"
clean=false
# s: means -s requires an argument
# s:: means -s has an optional argument
# s (without colon) means -s doesn't accept arguments
args=$(getopt -a -o s::ch --long src::,clean,help -- "$@")
if [[ $? -gt 0 ]]; then
usage
fi

eval set -- ${args}
while :
do
  case $1 in
    -s | --src)
      # If $2 starts with '-' or is empty (--), it's not a value but another option
      if [[ "$2" == "--" || "$2" == -* ]]; then
          src_interface=""
          shift
      else
          src_interface="$2"
          shift 2
      fi
      ;;
    -c | --clean)
      clean=true
      shift ;;
    -h | --help)
      usage
      exit 1
      shift ;;
    # -- means the end of the arguments; drop this, and break out of the while loop
    --) shift; break ;;
    *) >&2 echo Unsupported option: $1
      usage ;;
  esac
done

if [[ -z "${src_interface}" ]]; then
    src_interface=$(nsenter -t1 -n ip route | awk '/default/ {print $5}' | head -n1)
fi

if "${clean}"; then
    # Delete existing interfaces in the container
    ip link del macvlandnsm || true
    # Delete existing interfaces in the host namespace
    nsenter -t1 -n ip link del macvlandnsm || true
    exit 0
fi
main "${src_interface}" "${interface_type}" "${interface_mode}"

Here is the current state of the boot:

A picture of a screen showing the Pi booted into the UEFI firmware. In the background, it shows the Raspberry Pi raspberry. At the bottom, several shortcuts are shown to enter setup, the shell or continue booting. At the top, text showing an attempt to do a PXE boot via IPv4 and IPv6 is displayed. In both cases, the remote boot failed.

The Pi successfully boots into the UEFI firmware.

Getting the Pi to execute Tinkerbell’s iPXE script

It was very convenient to see that the UEFI firmware also attempts a PXE boot. This allowed me to continue with pointing this stage of the boot to Tinkerbell’s iPXE binaries. For the most part, these are standard iPXE binary builds. The only difference is that Tinkerbell introduced a user class setting to the PXE requests the iPXE boot program will send, to make those requests easier to work with.

Instructing the UEFI firmware to fetch the iPXE binary from Tinkerbell only needed one additional setting in Dnsmasq:

pxe-service=ARM64_EFI,"EFI Netboot",snp.efi,203.0.113.200

This line sets the boot file to snp.efi and instructs the iPXE firmware to fetch it from Tinkerbell, not Dnsmasq. This is what the exchange looks like:

dnsmasq-dhcp[3309]: 3924602938 available DHCP subnet: 203.0.113.255/255.255.255.0
dnsmasq-dhcp[3309]: 3924602938 vendor class: PXEClient:Arch:00011:UNDI:003000
dnsmasq-dhcp[3309]: 3924602938 PXE(macvlandnsm) e4:5f:01:bc:f4:ce proxy
dnsmasq-dhcp[3309]: 3924602938 tags: macvlandnsm
dnsmasq-dhcp[3309]: 3924602938 bootfile name: snp.efi
dnsmasq-dhcp[3309]: 3924602938 server name: 203.0.113.200
dnsmasq-dhcp[3309]: 3924602938 next server: 203.0.113.200
dnsmasq-dhcp[3309]: 3924602938 sent size:  1 option: 53 message-type  5
dnsmasq-dhcp[3309]: 3924602938 sent size:  4 option: 54 server-identifier  203.0.113.18
dnsmasq-dhcp[3309]: 3924602938 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
dnsmasq-dhcp[3309]: 3924602938 sent size: 17 option: 97 client-machine-id  00:15:31:c0:00:00:00:00:00:00:00:e4:5f:01...

This got me a little bit further, but ended with the Pi dropping me into the UEFI firmware screen:

A picture of a screen showing the UEFI firmware config screen. Similar to an x86 UEFI menu, it shows information about the Pi like its CPU and RAM. Options include setting the language, and entering submenus for Device Manager, Boot Manager and Boot Maintenance Manager. It doesn't show any indication of why the UEFI menu is shown.

Instead of booting HookOS, I’m ending up in the UEFI config menu.

To fix the issue, I had to tell iPXE where to fetch the iPXE script, which I did with the following lines in the Dnsmasq config:

dhcp-match=tinkerbell, option:user-class, Tinkerbell
pxe-service=tag:tinkerbell,ARM64_EFI,"EFI Netboot IPXE",http://203.0.113.200/auto.ipxe

This had no effect at all, or at least that was what it looked like to me. I just ended up on the same UEFI screen. But right before that, I saw flashes of an error message, but wasn’t able to really see it. After some vain attempts at changing the pxe-service line, I gave in and connected a keyboard. Pressing CTRL+b right after the iPXE binary started running, I got into an iPXE shell. I then just ran the autoboot command and finally got my error: The DHCP response was correct, iPXE was trying to fetch the iPXE script from the right place, it seemed. But it got a “Connection reset by peer” error. And then it dawned on me: Tinkerbell’s HTTP server wasn’t running on port 80. So the fix was simple, I changed the two lines from above to these:

dhcp-match=tinkerbell, option:user-class, Tinkerbell
dhcp-boot=tag:tinkerbell,"http://203.0.113.200:7171/auto.ipxe",,"{{ .Values.tinkerbellIP }}"

The switch from pxe-service to dhcp-boot was necessary because the iPXE script was not requesting PXE options in its DHCP request, and consequently, Dnsmasq did not send a PXE answer. Instead, iPXE was just expecting a boot file option being set.

The auto.ipxe “file” is a clever implementation detail from Tinkerbell worth talking about a bit. This file is an iPXE script, which can use the iPXE commands running in batch mode. The script can be found here. Instead of delivering a static script of some sort, Tinkerbell dynamically generates the iPXE script for each individual host. The script always does the same thing in principle: It loads the kernel and initramfs and defines the kernel command line and then boots into the kernel. But due to the dynamic nature, the kernel and initramfs can be set individually for every host in the Hardware manifest.

Getting Tinkerbell to send the auto.ipxe script

At this point, I was getting errors from Tinkerbell, because I hadn’t created a Hardware object for the Pi yet. I created this one:

apiVersion: tinkerbell.org/v1alpha1
kind: Hardware
metadata:
  name: testpi
spec:
  metadata:
    instance:
      id: e4:5f:01:bc:f4:ce
      ips:
        - address: 203.0.113.70
      allow_pxe: false
      hostname: testpi
      operating_system:
        distro: "ubuntu"
        version: "24.04"
  disks:
  - device: /dev/sda
  interfaces:
  - dhcp:
      arch: aarch64
      hostname: testpi
      mac: e4:5f:01:bc:f4:ce
      ip:
        address: 203.0.113.70
        netmask: 255.255.255.0
      name_servers:
      - 10.86.25.254
      uefi: true
    netboot:
      allowPXE: false
      allowWorkflow: true
  userData: |
    #cloud-config
    packages:
      - openssh-server
      - python3
      - sudo
    ssh_pwauth: false
    disable_root: true
    allow_public_ssh_keys: false
    timezone: "Europe/Berlin"
    users:
      - name: imhotep
        shell: /bin/bash
        ssh_authorized_keys:
          - from="192.0.2.100" ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOaxn8l16GNyBEgYzWO0BAko9fw8kkIq9tbels3hXdUt user@foo
        sudo: ALL=(ALL:ALL) ALL
    runcmd:
      - systemctl enable ssh.service
      - systemctl start ssh.service
    power_state:
      delay: 2
      timeout: 2
      mode: reboot    

Note specially the spec.interfaces.netboot.allowPXE: false option. This tells Tinkerbell that it shouldn’t be sending any answer to the host’s DHCP requests while PXE booting. I had to set the option, because by default, Tinkerbell would answer the initial DHCP request with a DHCP reply instructing the Pi to download the iPXE binary from Tinkerbell straight away. This works with normal PXE boot, but the Pi’s network boot is a bit special. It has to get stuff like the config.txt file from the TFTP server as well, and Tinkerbell can’t do that. Yet. I will go into a bit more detail at the end.

But even with this config set, the auto.ipxe script was not getting delivered. This time Tinkerbell output the following error message:

{
"time":"2025-06-26T19:02:24.911697105Z",
"level":"0",
"caller":"smee/internal/ipxe/script/ipxe.go:169",
"msg":"the hardware data for this machine, or lack there of, does not allow it to pxe",
"service":"smee",
"client":"203.0.113.70:42502",
"error":null
}
{
"time":"2025-06-26T19:02:24.911752728Z",
"level":"0",
"caller":"smee/internal/ipxe/http/middleware.go:37",
"msg":"response",
"service":"smee",
"method":"GET",
"uri":"/auto.ipxe",
"client":"203.0.113.70",
"duration":160896,
"status":404
}

So because allowPXE is false for the Pi, it also doesn’t get to download the auto.ipxe script. My ultimate solution for this was to completely disable DHCP for Tinkerbell and then setting allowPXE: true for the Pi. I was able to disable DHCP completely with this setting in Tinkerbell’s values.yaml:

deployment:
  envs:
    smee:
      dhcpEnabled: false

And after that, the Pi was able to boot into HookOS without further issue. I will talk about why this is suboptimal in the last section of this post.

Provisioning the Pi

Setting up the actual provisioning went rather smoothly after all of that. I followed the same approach as I did for the VM in the previous post:

apiVersion: tinkerbell.org/v1alpha1
kind: Template
metadata:
  name: pi-template
spec:
  data: |
    name: pi-template
    version: "0.1"
    global_timeout: 600
    tasks:
      - name: "os installation"
        worker: "{{`{{.machine_mac}}`}}"
        volumes:
          - /dev:/dev
          - /dev/console:/dev/console
        actions:
          - name: "install ubuntu"
            image: quay.io/tinkerbell/actions/image2disk:latest
            timeout: 900
            environment:
                IMG_URL: https://s3.example.com/public/images/mypi-image.img
                DEST_DISK: /dev/sda
                COMPRESSED: false
          - name: "add cloud-init config"
            image: quay.io/tinkerbell/actions/writefile:latest
            timeout: 90
            environment:
              DEST_DISK: {{ `{{ formatPartition ( index .Hardware.Disks 0 ) 2 }}` }}
              DEST_PATH: /etc/cloud/cloud.cfg.d/10_tinkerbell.cfg
              DIRMODE: "0700"
              FS_TYPE: ext4
              GID: "0"
              MODE: "0600"
              UID: "0"
              CONTENTS: |
                datasource:
                  Ec2:
                    metadata_urls: ["http://203.0.113.200:7172"]
                    strict_id: false
                manage_etc_hosts: localhost
                warnings:
                  dsid_missing_source: off
          - name: "add cloud-init ds-identity"
            image: quay.io/tinkerbell/actions/writefile:latest
            timeout: 90
            environment:
              DEST_DISK: {{ `{{ formatPartition ( index .Hardware.Disks 0 ) 2 }}` }}
              FS_TYPE: ext4
              DEST_PATH: /etc/cloud/ds-identify.cfg
              UID: 0
              GID: 0
              MODE: 0600
              DIRMODE: 0700
              CONTENTS: |
                datasource: Ec2
          - name: "remove default user data"
            image: quay.io/tinkerbell/actions/writefile:latest
            timeout: 90
            environment:
              DEST_DISK: {{ `{{ formatPartition ( index .Hardware.Disks 0 ) 1 }}` }}
              FS_TYPE: vfat
              DEST_PATH: /user-data
              UID: 0
              GID: 0
              MODE: 0600
              DIRMODE: 0700
              CONTENTS: |
                # Removed during provisioning
          - name: "remove default meta data"
            image: quay.io/tinkerbell/actions/writefile:latest
            timeout: 90
            environment:
              DEST_DISK: {{ `{{ formatPartition ( index .Hardware.Disks 0 ) 1 }}` }}
              FS_TYPE: vfat
              DEST_PATH: /meta-data
              UID: 0
              GID: 0
              MODE: 0600
              DIRMODE: 0700
              CONTENTS: |
                # Removed during provisioning
          - name: "remove default network config"
            image: quay.io/tinkerbell/actions/writefile:latest
            timeout: 90
            environment:
              DEST_DISK: {{ `{{ formatPartition ( index .Hardware.Disks 0 ) 1 }}` }}
              FS_TYPE: vfat
              DEST_PATH: /network-config
              UID: 0
              GID: 0
              MODE: 0600
              DIRMODE: 0700
              CONTENTS: |
                # Removed during provisioning
          - name: "reboot"
            image: ghcr.io/jacobweinstock/waitdaemon:latest
            timeout: 90
            pid: host
            command: ["reboot"]
            environment:
              IMAGE: alpine
              WAIT_SECONDS: 10
            volumes:
              - /var/run/docker.sock:/var/run/docker.sock    

The one noteworthy change here is in the files I’m removing/emptying. With the VM image, I had created that myself via the Ubuntu installer and Packer. But for the Pi, I was able to use the official Ubuntu preinstalled Raspberry Pi image. But that does have default user-data and network-config files as well. In the Pi image, those are located on the boot partition:

  • /user-data
  • /meta-data
  • /network-config

With these files removed, the Ubuntu image properly made use of the metadata server Tinkerbell provides and executed the user-data instructions delivered by it and defined in the Hardware object of the Raspberry Pi.

So now I finally had a fully provisioned Pi, without any manual intervention:

A picture of a screen showing the final lines of an Ubuntu boot. It shows the Ubuntu version as 24.02.2 LTS, some lines indicating that cloud-init ran successfully and finally a login prompt for the host testpi.

Final successful Ubuntu provisioned boot.

Next steps

The next phase of the Tinkerbell project will require me to don my thinking cap and probably try to write some Go code. As I’ve shown above, booting a Pi 4 is possible. But provisioning a Pi 5 the same way is not. The reason for that is that the Pi 5 UEFI project seems to be dead, looking at the archived repo. Additionally, the approach I’ve shown above requires DHCP to be completely switched off in Tinkerbell, because I needed to enable allowPXE to get the auto.ipxe script, but at the same time Tinkerbell cannot provide the files necessary for the initial PXE boot into UEFI/iPXE for a Pi.

But there might be a way around all of these issues, which should also work with the Pi 5: Booting into HookOS directly, skipping UEFI and iPXE. This should be possible by setting HookOS’ kernel and initramfs in the config.txt file for direct boot via the Pi’s firmware. The downside of this approach is that I’m losing Tinkerbell’s ability of adapting e.g. the kernel command line dynamically, as it does when booting through the iPXE script. Tinkerbell would only enter the picture after HookOS is already booted up.

Then there’s also the issue with diskless hosts which netboot not only for their initial provisioning via Tinkerbell, but instead would always netboot. The biggest issue here is how to distinguish between the two. When the host needs to be provisioned, it needs to be told to PXE boot into HookOS. If it is just doing a normal boot, it needs to boot into its own kernel and initramfs. The best decision point I can imagine for that are Tinkerbell’s workflows. They can be in different states, and they’re set to a “done” state when all of their tasks have been executed successfully for a given host. So whenever a DHCP request arrives, I could check whether that host has any pending workflows. If it does, I tell it to boot into HookOS, and otherwise I have it continue the boot normally.

Lots to think about. But I’m enjoying it - there’s certainly been a lot more “lab” in my Homelab than usual. 😁