In this post, I will describe my failed attempts of booting Tinkerbell’s in-memory HookOS directly on a Pi 4, without iPXE or UEFI.
This is part 5 of my Tinkerbell series.
In my previous post, I described how I provisioned a Pi 4 using Tinkerbell’s standard way via UEFI and iPXE. This was a complicated and convoluted process, requiring heavy use of Dnsmasq on the side and bouncing between requests to said Dnsmasq and Tinkerbell itself. In the end, I was only able to do it after completely switching off Tinkerbell’s DHCP functionality. I wasn’t particularly fond of that option, because I quite liked how it worked for provisioning the VM in my first experiments. I didn’t want to completely switch off DHCP in Tinkerbell just because of the Pi 4.
Another pretty big issue was the Pi 5. From everything I could see, the Pi 5 UEFI project is dead right now. So working with iPXE/UEFI was not possible for the Pi 5 anyway, and I’m already running three of those and I’m planning to add a fourth.
The potential solution with direct boot
So I took a look at what Tinkerbell’s provisioning actually does. It’s core part is the Tink workflow engine, running on HookOS. This is Tinkerbell’s in-memory provisioning OS. The only task it has is to provide a Linux environment with Docker to run the provisioning tasks. And it’s not a special Linux really, just one which runs entirely from the initramfs.
So the only thing I really needed was the ability to boot into the HookOS kernel, which again, isn’t actually anything special, and then run HookOS’ initramfs. And that’s already possible with the Pi’s netboot mechanism. You can provide the name of a kernel and an initramfs, and the Pi’s firmware will download those from the TFTP server it receives during DHCP discovery. This is at least a simpler approach than needing to work with UEFI and iPXE. And it has the advantage that it should also work with the Pi 5.
There are a couple of additional issues with this solution, mainly that I would still like a tighter connection with Tinkerbell’s DHCP side. But for now, I’m mostly interested in seeing what my overall options with Tinkerbell and the Raspberry Pi’s netboot process are. Then I will think a bit more about potential changes I could propose to the Tinkerbell project.
Trying the official HookOS release
The newest HookOS release is v0.10.0, so I started with that one. Besides the standard x86_64 and aarch64 kernels, HookOS also provides a version with Armbian’s Raspberry Pi kernel and an initramfs build for aarch64. I started with that one.
In preparation, I downloaded the hook_armbian-bcm2711-current.tar.gz
file from
HookOS, which contains the kernel and initramfs. But this leaves some Pi specific
files out, which are also needed when netbooting a Pi. I decided to get those
files from Armbian as well, namely this page.
I choose the “Minimal/IoT” image. Then I mounted the image locally to get at the
content of the boot partition:
losetup -f --show -P Armbian_25.5.1_Rpi4b_noble_current_6.12.28_minimal.img
mount /dev/loop0p1 /mnt/temp/
This then allowed me to copy a couple of files, namely:
bcm2711-rpi-4-b.dtb
(that was the only dtb I copied, because I’m working only with a Pi 4b for now)cmdline.txt
config.txt
fixup4*
start4*
The next challenge was the kernel command line. Tinkerbell provides a few important values through the kernel command line when booting via its iPXE script, so I booted the Pi with iPXE again and wanted to copy the kernel command line. But I did not have direct access to the Pi from my desktop, because HookOS doesn’t run SSH by default. I was using it through a separate keyboard and display.
At that point I had to sit back for a few minutes and consider my life choices a bit. Because with all the services I’ve got running in my Homelab, all the Kubernetes clusters, the Ceph storage clusters, the myriad of apps - I somehow did not have a no-frills, zero config way to share a copy+paste of the kernel command line from one host to another. I was a bit disappointed in myself.
But then, I had an excellent idea, if I may say so myself: Netcat!. It can do simple TCP transfer. So I launched this command on my desktop:
nc -l -p 1234 > out.txt
And then, on the Pi booted into HookOS, I ran this:
dmesg > out.txt
nc -w 3 198.51.100.25 1234 < out.txt
And just like that, I had the data available on my desktop. I’m honestly a bit enamored with myself for coming up with this rather simple and expedient solution. 😁
The important bits of the command line looked like this:
tink_worker_image=ghcr.io/tinkerbell/tink-agent:v0.18.3-b817f7f2 facility= syslog_host=203.0.113.200 grpc_authority=203.0.113.200:42113 tinkerbell_tls=false tinkerbell_insecure_tls=false worker_id=e4:5f:01:bc:f4:ce hw_addr=e4:5f:01:bc:f4:ce modules=loop,squashfs,sd-mod,usb-storage initrd=initramfs-aarch64
I added those Tinkerbell-specific options to the cmdline.txt
file in the
TFTP directory and also adapted the config.txt
, setting the HookOS kernel and
initramfs:
[all]
kernel=vmlinuz-armbian-bcm2711-current
initramfs initramfs-armbian-bcm2711-current followkernel
With all of that done, I booted the Pi up. While the kernel booted and the containerd in the initramfs was also started, there was no shell, and in the Tinkerbell logs I did not see any attempt by the Pi to contact Tinkerbell. I did not even see an attempt to get an IP via DHCP after the netboot part was done. The only error message I could see on the small screen I was using was this one:
Failed to read service spec error "open /containers/services/getty/config.json: No such file or directory"
Considering that getty is what provides the shell, at least I now knew why I wasn’t getting a prompt. So I unpacked the initramfs with this command to make sure the file was actually there:
gunzip -c initramfs-armbian-bcm2711-current | cpio -i
And yes, the file does actually exist in the initramfs. So what’s going on here? My main problem was that I wasn’t getting a shell, so I didn’t have any good way to get at the rest of the boot messages, to see whether there was another error. So I went for a bad way instead: Filming the small screen I had connected to the Pi. As you might imagine, this wasn’t a great solution. One, I had to transfer the video to my desktop via Nextcloud, because I couldn’t properly read anything on my phone’s screen. Then there’s the problem that the video is taken at a certain framerate, and sometimes the logs scrolled by too quickly to catch everything.
This is what all too much of the video looked like:

Not really readable output from trying to take a video of the boot process.
But I still got a bit more out of it, most importantly this message:
rootfs image is not initramfs (read error): looks like an initrd
But that was less than helpful. At least on my desktop, the initramfs looked perfectly fine, no issues packaging it up at all.
But while fudging with kernel command line options and the config.txt
content,
to no avail at all, I suddenly saw the console=
option. And realized that I
could make my life at least a bit easier. I got out my trusty USB-to-Serial
adapter and followed this tutorial
to get it attached to the Pi. After adding console=serial0,115200
to the
kernel command line, I was then able to connect to the Pi via serial console.
I used minicom on my desktop, where the serial adapter showed up as /dev/ttyUSB0
:
minicom -b 115200 -D /dev/ttyUSB0
And just like that, I had all of the boot time messages on my desktop and no longer needed to film the boot process.
But I still wasn’t really getting anywhere, the errors stayed the same. I also tried a few other kernels, thinking that there might be something wrong with the HookOS kernel. I tried for example the Ubuntu kernel I use for my production Pis, but to no avail. The error stayed the same.
So I decided I would dig into HookOS and LinuxKit, which HookOS is based on.
Trying a newer kernel
Still having no idea what’s going on, I decided to try a newer kernel. The last HookOS release was from November 2024, so I figured perhaps something changed.
And at this point, I have to send a really big kudos to the Tinkerbell team for HookOS’ builds. I was perfectly prepared to spend some time to get my VM set up properly to actually build HookOS successfully. But I didn’t need to. Quite to the contrary. Most dependencies were automatically installed, and everything went very smoothly. I was rather impressed. 👍
So for the experiment, I cloned the HookOS repo locally and switched into it. Then I had to manually install a few tools which were missing:
apt install docker.io docker-buildx
Then I just executed the build script:
./build.sh kernel armbian-bcm2711-current
This installed a few additional dependencies via apt, and then build an OCI image with the newest Armbian Pi kernel. Then, to build the full HookOS, including device trees and initramfs, I ran this command:
./build.sh build armbian-bcm2711-current
At the time I executed the commands, the Armbian kernel I got was 6.12.35-S8292-Dbdda-P0000-Ce6dbH2313-HK01ba-Vc222-Ba566-R448a
.
The build results in an out/
directory in the local dir, which contains the
device tree, Raspberry Pi overlays and kernel+initramfs. I copied it all into
my TFTP directory and tried to boot the Pi again. But yet again, I did get a
boot and containerd startup, but no prompt.
In a last desperate attempt, I tried with a 5.15
kernel from Armbian’s
kernel-bcm2711-legacy
, but that also ran into exactly the same issue.
Mangling HookOS
After a while of fruitlessly playing around, I started reading more and more Google hits talking about truncation of initramfs by some implementations. So I decided to try to reduce the size by re-compressing the initramfs with zstd. That only reduced the size of the initramfs down to 122 MB, but it did something more important: It confirmed the truncation theory via this kernel message:
rootfs image is not initramfs (ZSTD-compressed data is truncated); looks like an initrd
[...]
RAMDISK: zstd image found at block 0
RAMDISK: incomplete write (-28 != 131072)
This error indicates that the initramfs compression was correctly recognized, but the data was truncated. I finally had proper proof that truncation was the problem.
For further investigation, I adapted HookOS a bit to ensure that Getty gets launched early in the boot process, in the hope that I would get a prompt and could look around.
LinuxKit, the dockerized Linux distro HookOS is build upon, has a template file which describes what to put into the initramfs. The one for HookOS looks like this:
kernel:
image: "${HOOK_KERNEL_IMAGE}"
cmdline: "464vn90e7rbj08xbwdjejmdf4it17c5zfzjyfhthbh19eij201hjgit021bmpdb9ctrc87x2ymc8e7icu4ffi15x1hah9iyaiz38ckyap8hwx2vt5rm44ixv4hau8iw718q5yd019um5dt2xpqqa2rjtdypzr5v1gun8un110hhwp8cex7pqrh2ivh0ynpm4zkkwc8wcn367zyethzy7q8hzudyeyzx3cgmxqbkh825gcak7kxzjbgjajwizryv7ec1xm2h0hh7pz29qmvtgfjj1vphpgq1zcbiiehv52wrjy9yq473d9t1rvryy6929nk435hfx55du3ih05kn5tju3vijreru1p6knc988d4gfdz28eragvryq5x8aibe5trxd0t6t7jwxkde34v6pj1khmp50k6qqj3nzgcfzabtgqkmeqhdedbvwf3byfdma4nkv3rcxugaj2d0ru30pa2fqadjqrtjnv8bu52xzxv7irbhyvygygxu1nt5z4fh9w1vwbdcmagep26d298zknykf2e88kumt59ab7nq79d8amnhhvbexgh48e8qc61vq2e9qkihzt1twk1ijfgw70nwizai15iqyted2dt9gfmf2gg7amzufre79hwqkddc1cd935ywacnkrnak6r7xzcz7zbmq3kt04u2hg1iuupid8rt4nyrju51e6uejb2ruu36g9aibmz3hnmvazptu8x5tyxk820g2cdpxjdij766bt2n3djur7v623a2v44juyfgz80ekgfb9hkibpxh3zgknw8a34t4jifhf116x15cei9hwch0fye3xyq0acuym8uhitu5evc4rag3ui0fny3qg4kju7zkfyy8hwh537urd5uixkzwu5bdvafz4jmv7imypj543xg5em8jk8cgk7c4504xdd5e4e71ihaumt6u5u2t1w7um92fepzae8p0vq93wdrd1756npu1pziiur1payc7kmdwyxg3hj5n4phxbc29x0tcddamjrwt260b0w"
init:
# this init container sha has support for volumes
- linuxkit/init:872d2e1be745f1acb948762562cf31c367303a3b
- "${HOOK_CONTAINER_RUNC_IMAGE}"
- "${HOOK_CONTAINER_CONTAINERD_IMAGE}"
- linuxkit/ca-certificates:v1.0.0
- linuxkit/firmware:24402a25359c7bc290f7fc3cd23b6b5f0feb32a5 # "Some" firmware from Linuxkit pkg; see https://github.com/linuxkit/linuxkit/blob/master/pkg/firmware/Dockerfile
- "${HOOK_CONTAINER_EMBEDDED_IMAGE}"
onboot:
- name: dhcpcd-once
image: linuxkit/dhcpcd:v1.0.0
command: [ "/etc/ip/dhcp.sh", "true" ] # 2nd paramter is one-shot true/false: true for onboot, false for services
#capabilities.add:
# - CAP_SYS_TIME # for ntp one-shot no-max-offset after ntpd, for hardware missing RTC's that boot in 1970
capabilities:
- all
binds.add:
- /var/lib/dhcpcd:/var/lib/dhcpcd
- /run:/run
- /etc/ip/dhcp.sh:/etc/ip/dhcp.sh
- /dhcpcd.conf:/dhcpcd.conf
runtime:
mkdir:
- /var/lib/dhcpcd
services:
- name: udev # as a service; so system reacts to changes in devices
image: "${HOOK_CONTAINER_UDEV_IMAGE}"
command: [ "/lib/systemd/systemd-udevd", "--debug" ]
capabilities: [ all ]
binds: [ /dev:/dev, /sys:/sys, /lib/modules:/lib/modules ]
rootfsPropagation: shared
net: host
pid: host
devices:
- path: all
type: b
- path: all
type: c
- name: getty
image: linuxkit/getty:v1.0.0
capabilities:
- all
binds.add:
- /etc/profile.d/local.sh:/etc/profile.d/local.sh
- /etc/securetty:/etc/securetty
- /etc/motd:/etc/motd
- /etc/os-release:/etc/os-release
- /:/host_root
- /run:/run
- /dev:/dev
- /dev/console:/dev/console
- /usr/bin/nerdctl:/usr/bin/nerdctl
env:
- INSECURE=true
devices:
- path: all
type: b
- name: hook-docker
image: "${HOOK_CONTAINER_DOCKER_IMAGE}"
capabilities:
- all
net: host
pid: host
mounts:
- type: cgroup2
options: [ "rw", "nosuid", "noexec", "nodev", "relatime" ]
destination: /sys/fs/cgroup
binds.add:
- /dev/console:/dev/console
- /dev:/dev
- /etc/resolv.conf:/etc/resolv.conf
- /lib/modules:/lib/modules
- /var/run/docker:/var/run
- /var/run/images:/var/lib/docker
- /var/run/worker:/worker
- /:/host_root
runtime:
mkdir:
- /var/run/images
- /var/run/docker
- /var/run/worker
devices:
- path: all
type: b
- path: all
type: c
- name: hook-bootkit
image: "${HOOK_CONTAINER_BOOTKIT_IMAGE}"
capabilities:
- all
net: host
mounts:
- type: cgroup2
options: [ "rw", "nosuid", "noexec", "nodev", "relatime" ]
destination: /sys/fs/cgroup
binds:
- /var/run/docker:/var/run
runtime:
mkdir:
- /var/run/docker
- name: dhcpcd-daemon
image: linuxkit/dhcpcd:v1.0.0
command: [ "/etc/ip/dhcp.sh", "false" ] # 2nd paramter is one-shot true/false: true for onboot, false for services
#capabilities.add:
# - CAP_SYS_TIME # for ntp one-shot no-max-offset after ntpd, for hardware missing RTC's that boot in 1970
capabilities:
- all
binds.add:
- /var/lib/dhcpcd:/var/lib/dhcpcd
- /run:/run
- /etc/ip/dhcp.sh:/etc/ip/dhcp.sh
- /dhcpcd.conf:/dhcpcd.conf
runtime:
mkdir:
- /var/lib/dhcpcd
files:
- path: etc/os-release
mode: "0444"
contents: |
NAME="HookOS"
VERSION=${HOOK_VERSION}
ID=hookos
VERSION_ID=${HOOK_VERSION}
PRETTY_NAME="HookOS ${HOOK_KERNEL_ID} v${HOOK_VERSION}/k${HOOK_KERNEL_VERSION}"
ANSI_COLOR="1;34"
HOME_URL="https://github.com/tinkerbell/hook"
- path: etc/securetty
contents: |
console
ttyUSB0
ttyUSB1
ttyUSB2
The above is only supposed to serve as an example, so I removed a lot of lines and comments. If you’d like to have a look at the full file, have a look at the GitHub repo.
My idea was to see whether I could get Getty to be put into the root of the
initramfs, instead of having it launched as a container. Looking at the Yaml file,
I decided I would just try to move it from the services:
list to the init:
list, like this:
init:
- linuxkit/getty:v1.0.0
And that actually worked! The other issues were still there - the image was still truncated, but now Getty was coming early enough in the image to be in the non-truncated part. I was now getting a prompt when booting into the initramfs.
Looking around, I still couldn’t fine any other obvious errors, just more
boot services which failed to start because their config.json
files became
victims of the truncation. But I at least had another piece of proof that
truncation was happening, as I checked the total size of the unpacked initramfs
on my VM, and it was 603 MB. Checking the /
size in the booted initramfs only
showed 404 MB total. Weirdly, part of that 404 MB was a 90 MB initrd.img
file
in /
which I couldn’t make heads or tails of. The file definitely wasn’t
from the actual initramfs, and I wasn’t able to figure out where it came from
or what was in it from Google either.
Anyone got any idea what that initrd.img
file suddenly appearing in my initramfs
might be?
At this point it was pretty clear that I’m having a truncation problem. But googling a bit, the next question was: Where?
Figuring out who’s truncating
Initial searches pointed me towards TFTP as the culprit. The Wikipedia article has this to say:
The original protocol has a transfer file size limit of 512 bytes/block x 65535 blocks = 32 MB. In 1998 this limit was extended to 65535 bytes/block x 65535 blocks = 4 GB by TFTP Blocksize Option RFC 2348. […] If TFTP packets should be kept within the standard Ethernet MTU (1500), the blocksize value is calculated as 1500 minus headers of TFTP (4 bytes), UDP (8 bytes) and IP (20 bytes) = 1468 bytes/block, this gives a limit of 1468 bytes/block x 65535 blocks = 92 MB. Today most servers and clients support block number roll-over (block counter going back to 0 or 1[10] after 65535) which gives an essentially unlimited transfer file size.
So it looked like, unless block number roll-over was implemented in the Pi firmware, the maximum file size would be 92 MB. To try to verify that, I took a tcpdump from the transfer of a 155 MB initramfs. Here is the option acknowledgment packet:
Acknowledged options for the initramsfs transfer
And here is the end of the transmission: Final data packet of the TFTP transfer for the initramfstsize
option in the option acknowledgment at the start shows. In addition, a lot
more blocks (105738) were transferred than the max block number of 65535. The
actual block number of the last block was 40202, which indicates that the
previously mentioned block number roll-over was working as intended.
Overall, it did look like the entire file got transferred correctly.
So the next possibility was that there’s something going wrong after the transfer.
For that, I had to have a look at the Pi’s early boot process. First, I enabled
the BOOT_UART=1
option. This option is in the Pi’s firmware config stored in
EEPROM, so it needs to be set via the rpi-eeprom-config
script from a running
Linux, it cannot be set via the config.txt
file. Once I had that, I got the
first disappointment, as the output just stopped past this point:
TFTP_GET: aa:ce:d5:6e:90:cd 203.0.113.18 start4.elf
RX: 12 IP: 0 IPV4: 10 MAC: 10 UDP: 10 UDP RECV: 10 IP_CSUM_ERR: 0 UDP_CSUM_ERR: 0
TFTP: complete 2256224
RX: 14 IP: 0 IPV4: 12 MAC: 12 UDP: 12 UDP RECV: 12 IP_CSUM_ERR: 0 UDP_CSUM_ERR: 0
Read start4.elf bytes 2256224 hnd 0x0
[...]
Starting start4.elf @ 0xfec00200 partition -1
It only started up again when the kernel started booting. To get output from
the start4.elf
execution, I had to add another option, uart_2ndstage. This option can luckily be set in the config.txt
file, so no further trip
into Linux was necessary.
That then finally delivered the answer to the question of where the truncation happens with this message:
MESS:00:00:55.768976:0: initramfs loaded to 0x29440000 (size 0x5bbfa44)
The size given here is approximately 96 MB. So even though the file was larger, and it looked like it was transferred in its entirety, the Pi’s firmware only loaded 96 MB into memory for the kernel to use. And that’s where the truncation was coming from.
New plan
So I needed a new plan. I think the most reasonable next approach would be to turn the boot process into a two-stage setup. The first stage is a small initramfs, only containing the tools to download the second stage, which will be the full initramfs, and then to pivot into that new image.
One problem is that I don’t want to hardcode the address/name of the initramfs
image into the first stage initramfs. One possible option would be to add an
option to the kernel command line, as the kernel forwards all options it doesn’t
know to the init
binary it executes after startup.
This approach has the advantage that the overall HookOS process doesn’t need to be changed. The original initramfs can be left entirely untouched and never needs to know that there was another boot stage for specific boards.
Testing that will be my next task. I wanted to get out this post first because I felt that, with the description of the investigations I did and the explanation of the solution to the problem, the blog post would end up in another one of my tomes. And the “posts, not tomes” project is still in effect. 😁