Improving Multi-Arch Image Build Performance by not Emulating

Wherein I update my container image build pipeline in Woodpecker with buildah.

A couple of weekends ago, I naively thought: Hey, how about stepping away from my Tinkerbell experiments for a weekend and quickly setting up a Bookwyrm instance?

As such things tend to turn out, that rookie move turned into a rather deep rabbit hole, mostly on account of my container image build pipeline not really being up to snuff.

The current setup

Before going into details on the problem and ultimate solution, I’d like to sketch out my setup. For a detailed view, have a look at this post.

I’m running Woodpecker CI in my Kubernetes cluster, running container image builds via the docker-buildx plugin.

As I’m running Woodpecker with the Kubernetes backend, each step in a pipeline will be executed in its own Pod. Each pipeline, in turn, gets a PersistentVolume mounted, which is shared between all steps of that pipeline. In my pipelines for the container image builds, I only run the docker-buildx plugin as a step, once for PRs where the image is only build but not pushed, and once for pushes onto main, where the image is build and pushed.

The docker-buildx plugin uses Docker’s buildx command, and the BuildKit that makes available to run the image build. Important to note for this post is that BuildKit will happily build multi-arch images. It does so utilizing Qemu.

Now the issue with that is: The majority of my Homelab consists of Raspberry Pi 4 and a single low power x86 machine. As you might imagine, that makes emulation very slow, especially on the Pis, which do not have any virtualization instructions.

Now onto the problems I’m having with that setup.

The problems

Let’s start with the problem which triggered this particular rabbit hole, the Bookwyrm image build. I won’t go into the details of the image here, that will come in the next post when I describe the Bookwyrm setup.

The initial issue was one I had seen before on occasion. In this scenario, the build just gets canceled, with no indication of what went wrong in the Woodpecker logs for the build step. After quite a lot of digging, I finally found these lines in the logs of the machine running one of the failed CI Pods:

kubelet[1088]: I0728 21:07:42.763129    1088 eviction_manager.go:366] "Eviction manager: attempting to reclaim" resourceName="ephemeral-storage"
kubelet[1088]: I0728 21:07:42.763296    1088 container_gc.go:88] "Attempting to delete unused containers"
kubelet[1088]: I0728 21:07:43.131475    1088 image_gc_manager.go:404] "Attempting to delete unused images"
kubelet[1088]: I0728 21:07:43.172539    1088 eviction_manager.go:377] "Eviction manager: must evict pod(s) to reclaim" resourceName="ephemeral-storage"
kubelet[1088]: I0728 21:07:43.174677    1088 eviction_manager.go:395] "Eviction manager: pods ranked for eviction" pods=["woodpecker/wp-01k194yzh8bg8tzngrf7x6w3k4","monitoring/grafana-pg-cluster-1","harbor/harbor-pg-cluster-1","harbor/harbor-registry-5cb6c944f5-wm6np","wallabag/wallabag-679f44d9d5-9gl8m","harbor/harbor-portal-578db97949-d52sp","forgejo/forgejo-74948996b9-r94c2","harbor/harbor-jobservice-6cb7fc6d4b-gsswv","harbor/harbor-core-6569d4f449-grtrr","woodpecker/woodpecker-agent-1","taskd/taskd-6f9699f5f4-qkjkr","kube-system/cilium-5tx4t","fluentbit/fluentbit-fluent-bit-frskm","rook-ceph/csi-cephfsplugin-8f4jh","rook-ceph/csi-rbdplugin-cnxfz","kube-system/cilium-envoy-gx7ck"]
crio[780]: time="2025-07-28 21:07:43.179344359+02:00" level=info msg="Stopping container: 7ba324965ba9ed751bd08ac4b464631b2d5dfa05d31f36d98253b68a0d5ec7d0 (timeout: 30s)" id=b69f9664-c0ae-4505-9363-6966afa90b77 name=/runtime.v1.RuntimeService/StopContainer
crio[780]: time="2025-07-28 21:07:43.837431719+02:00" level=info msg="Stopped container 7ba324965ba9ed751bd08ac4b464631b2d5dfa05d31f36d98253b68a0d5ec7d0: woodpecker/wp-01k194yzh8bg8tzngrf7x6w3k4/wp-01k194yzh8bg8tzngrf7x6w3k4" id=b69f9664-c0ae-4505-9363-6966afa90b77 name=/runtime.v1.RuntimeService/StopContainer
kubelet[1088]: I0728 21:07:44.097018    1088 eviction_manager.go:616] "Eviction manager: pod is evicted successfully" pod="woodpecker/wp-01k194yzh8bg8tzngrf7x6w3k4"

The Pod just ran out of space while building the images. The fix was relatively simple, as Woodpecker already provides a Pipeline Volume. In the case of the Kubernetes backend, that volume is a PVC created per pipeline and then mounted into the Pods for all of the steps. In my case, that’s a 50 GB CephFS volume. But I wasn’t using that volume for anything, as the storage for BuildKit, running my image builds, was still at the default /var/lib/docker.

So hooray, just move the docker storage to the pipeline volume. I did so by using the parameter the docker-buildx plugin already provides, storage_path:

storage_path: "/woodpecker/docker-storage"

And just like that, I had fixed the problem. Or not.

A screenshot of Woodpecker's CI run UI. It shows that the commit being build is from the 'bookwyrm-image' branch. There are three steps in the pipeline: clone, clone Bookwyrm repo and build image. All three are seemingly successful, with clone taking 16 seconds, clone bookwyrm repo clocking in at 21s and build image taking 59:21. The overall workflow takes exactly 1h and is red. On the right is the build log for the image, showing a pip invocation. The last few lines indicate the build of the Python wheel for libsass, showing a lot of 'still running...' outputs. The timestamps indicate that by the time of the timeout, the build was running for 21 minutes. — 21 minutes and running for a libsass build.

So much for that all too short moment of triumph. The storage issue was fixed, but the image still could not be build. Looking through previous runs, I saw that the issue wasn’t just the duration of the pip install, but also the initial pull of the Python image. In one of the test builds, the initial pull took over 50 minutes all on its own. Not much time left for the actual setup. The root cause was at least not I/O saturation. The CI run I was looking at ran from 22:25 to 23:25 in the below graph:

A screenshot of a Grafana time series plot. It shows the time from 22:20 to 00:00. There are three plots shown, each representing one of the HDDs in my system. The metric is I/O utilization. At the beginning, it sits at around 20% to 35%, but at 23:23 it goes up to 80%, shortly followed by going up to around 100% for all three HDDs around 23:28. It stays there until around 23:56, when it goes back to below 10%. — I/O utilization on the HDDs in my Ceph cluster, home of the CephFS data pool.

The region of 100% I/O saturation in the end, starting at 23:25, is the CephFS cleanup after the pipeline had failed and the image needed to be cleaned up. The actual CI run is the 20% to 35% utilization before that.

But I still had the feeling that storage was at least part of the problem. So I tried to use Ceph RBDs instead of CephFS, which also had the advantage of running on SATA SSDs instead of HDDs. But that also did not bring any real improvements. Sure, the build got a lot further and did not spend all its time just extracting the Python image, but it still didn’t finish within the 1h deadline.

I finally ended up figuring that the reason it was still timing out was emulation.

Removing emulation from my image build pipelines

As I’ve mentioned above, the docker-buildx Woodpecker plugin I was using used Docker’s BuildKit under the hood. BuildKit has the ability to do multi-arch builds out of the box, and uses Qemu for the non-native architectures. This gets pretty slow on a Raspberry Pi or a low power x86 machine. So my next plan was to go for parallel builds of all archs on hosts with the same arch.

BuildKit and docker-buildx already have support for doing this, via BuildKit’s Remote Builders. But as per the docker-buildx documentation, this can only be done via SSH. I initially thought that this would work with BuildKit daemons set up to receive external connections, but I was mistaken. Instead of using BuildKit’s build-in remote driver functionality, docker-buildx instead sets up normal builders with their connection strings pointing to the remote machines for which SSH was configured. BuildKit would then use those remote machine’s Docker sockets to run the builds.

After some thinking, I decided to dump docker-buildx altogether. I really didn’t like the idea of somehow setting up inter-Pod SSH connections. That just felt all kinds of wrong.

So I decided: I’ll just do it myself, using Buildah. I’ve had that on my list anyway, so here we go, a bit earlier than planned. Some inspiration for what follows was found in this blog post. It uses Tekton as the task engine, not Woodpecker, but still was a good starting point. It was especially useful for answering how to put together the images produced for different architectures in one manifest.

I started out by building the image for Buildah. The Containerfile ended up looking like this:

ARG alpine_ver
FROM alpine:$alpine_ver

RUN apk --no-cache update\
	&& apk --no-cache add buildah netavark iptables bash jq

I then set up a simple test project in Woodpecker:

  - name: build amd64 image
    image: harbor.example.com/buildah/buildah:latest
    commands:
      - buildah build -t testing:0.1 --build-arg alpine_ver=3.22.1 -f testing/Containerfile testing/
    depends_on: []
    backend_options:
      kubernetes:
        nodeSelector:
          kubernetes.io/arch: "amd64"

The Containerfile looked something like this:

ARG alpine_ver
FROM alpine:$alpine_ver

RUN apk --no-cache update\
	&& apk --no-cache add buildah

Basically, a copy of my Buildah image, just to have something to test. One thing which surprised me to find out: Woodpecker doesn’t actually allow setting a platform per step. So I got lucky that the Kubernetes backend allows me to specify the nodeSelector for the step’s Pod.

Right away, the first run produced the following error:

Error: error writing "0 0 4294967295\n" to /proc/16/uid_map: write /proc/16/uid_map: operation not permittedtime="2025-08-07T20:31:45Z" level=error msg="writing \"0 0 4294967295\\n\" to /proc/16/uid_map: write /proc/16/uid_map: operation not permitted"

Clearly, my dream of rootless image builds would not be fulfilled today, so I wanted to enable the project to be allowed to run privileged pipelines. Up to now, I had the docker-buildx plugin in a separate instance-wide list of privileged plugins. But my new container was, at this point, a simple step, not a plugin.

So my first step was to set my own user as an admin, because I had never needed admin privileges for Woodpecker before. This I did via the WOODPECKER_ADMIN environment variable in my values.yaml file for the Woodpecker chart:

server:
  env:
    WOODPECKER_ADMIN: "my-user"

After that, the trusted project settings appeared in the Woodpecker settings page:

A screenshot of Woodpecker's project settings page. It shows the 'Project' tab being selected. Under the 'Trusted' heading, the 'Security' option checkbox is checked. The 'Network' and 'Volumes' options are left unchecked. — Trusted settings in the project configuration of Woodpecker. The options under the ‘Trusted’ heading only show up for admin users.

Enabling the Security option allowed me to run the Buildah containers in privileged mode, by adding the privileged: true option.

The next error I got was this one:

Error: 'overlay' is not supported over overlayfs, a mount_program is required: backing file system is unsupported for this graph driver
time="2025-08-07T20:57:11Z" level=warning msg="failed to shutdown storage: \"'overlay' is not supported over overlayfs, a mount_program is required: backing file system is unsupported for this graph driver\""

At this point, my pipeline volume was still on a Ceph RBD, as I had not yet realized that, with the plan of running multiple Buildah steps for the different platforms in parallel, I would need RWX volumes for the pipelines. So I decided that the right solution would be to move the storage onto my pipeline volume, where before it just sat in the container’s own filesystem, leading to the above “OverlayFS on OverlayFS” error. I did this by adding --root /woodpecker to the Buildah command.

And then I got the next one:

STEP 1/2: FROM alpine:3.22.1
Error: creating build container: could not find "netavark" in one of [/usr/local/libexec/podman /usr/local/lib/podman /usr/libexec/podman /usr/lib/podman].  To resolve this error, set the helper_binaries_dir key in the `[engine]` section of containers.conf to the directory containing your helper binaries.

This was fixed rather easily by adding netavark to the Buildah image. I had a similar error next, about iptables not being available. So I installed that one as well.

But that wasn’t all. Oh no, here’s another error:

buildah --root /woodpecker build -t testing:0.1 --build-arg alpine_ver=3.22.1 -f testing/Containerfile testing/
STEP 1/2: FROM alpine:3.22.1
WARNING: image platform (linux/arm64/v8) does not match the expected platform (linux/amd64)
STEP 2/2: RUN apk --no-cache update	&& apk --no-cache add buildah
exec container process `/bin/sh`: Exec format error
Error: building at STEP "RUN apk --no-cache update	&& apk --no-cache add buildah": while running runtime: exit status 1

That one confused me a little bit, to be honest. It wasn’t difficult to fix, I just had to add the --platform linux/amd64 option to the Buildah command. What confused me was that Buildah didn’t somehow figure that out for itself.

And this was the point where I realized that my two CI steps, one for amd64, one for arm64, did not run in parallel. The one started only after the other had failed. One kubectl describe -n woodpecker pods wp-... later, I saw that that was because the Pod which launched second failed to mount the pipeline volume. And that in turn was because I had switched to an SSD-backed Ceph RBD for the volume, to improve speed. But RBDs are, by their nature as block devices, RWO, and cannot be mounted by multiple Pods.

I switched the volumes back to CephFS and was met with the same error I had seen previously and “fixed” by moving Buildah’s storage onto the pipeline volume:

time="2025-08-07T21:56:14Z" level=error msg="'overlay' is not supported over <unknown> at \"/woodpecker/overlay\""
Error: kernel does not support overlay fs: 'overlay' is not supported over <unknown> at "/woodpecker/overlay": backing file system is unsupported for this graph driver
time="2025-08-07T21:56:14Z" level=warning msg="failed to shutdown storage: \"kernel does not support overlay fs: 'overlay' is not supported over <unknown> at \\\"/woodpecker/overlay\\\": backing file system is unsupported for this graph driver\""

I’m not sure why it said “unknown”, but the filesystem was CephFS. After some searching, I found out that OverlayFS and CephFS are seemingly incompatible. But the issue was fixable by adding --storage-driver=vfs to the Buildah command. The VFS driver is a bit older than OverlayFS, and a bit slower. But at least it works on CephFS.

And believe it or not, that was the last error. After adding the --storage option, the build ran through cleanly. At this point, my Woodpecker workflow looked like this:

when:
  - event: push
    path:
      - '.woodpecker/testing.yaml'
      - 'testing/*'

variables:
  - &alpine-version '3.22.1'

steps:
  - name: build amd64 image
    image: harbor.example.com/homelab/buildah:0.4
    commands:
      - buildah --root /woodpecker build --storage-driver=vfs --platform linux/amd64 -t testing:0.1 --build-arg alpine_ver=3.22.1 -f testing/Containerfile testing/
    depends_on: []
    privileged: true
    backend_options:
      kubernetes:
        nodeSelector:
          kubernetes.io/arch: "amd64"
    when:
      - evaluate: 'CI_COMMIT_BRANCH != CI_REPO_DEFAULT_BRANCH'
  - name: build arm64 image
    image: harbor.example.com/homelab/buildah:0.4
    commands:
      - buildah --root /woodpecker build --storage-driver=vfs --platform linux/arm64 -t testing:0.1 --build-arg alpine_ver=3.22.1 -f testing/Containerfile testing/
    depends_on: []
    privileged: true
    backend_options:
      kubernetes:
        nodeSelector:
          kubernetes.io/arch: "arm64"
    when:
      - evaluate: 'CI_COMMIT_BRANCH != CI_REPO_DEFAULT_BRANCH'
  - name: push image
    image: harbor.example.com/homelab/buildah:0.4
    commands:
      - sleep 10000
    depends_on: ["build amd64 image", "build arm64 image"]
    privileged: true
    when:
      - evaluate: 'CI_COMMIT_BRANCH != CI_REPO_DEFAULT_BRANCH'

With this configuration, the two builds for amd64 and arm64 are run in parallel, and the final push image step would be responsible for combining the images into a single manifest and pushing it all to my Harbor instance.

I ran a test build and then exec’d into the Pod when the pipeline arrived at the push image step. I used the following commands to combine the manifests and push them up to Harbor:

buildah --root /woodpecker --storage-driver=vfs manifest create harbor.example.com/homelab/testing:0.1
buildah --root /woodpecker --storage-driver=vfs manifest add harbor.example.com/homelab/testing:0.1 3883d7a9067d
buildah --root /woodpecker --storage-driver=vfs manifest add harbor.example.com/homelab/testing:0.1 0130169db3bb
buildah login https://harbor.example.com
buildah --root /woodpecker --storage-driver=vfs manifest push harbor.example.com/homelab/testing:0.1 docker://harbor.example.com/homelab/testing:0.1

The problematic thing about this approach was that I had no way of knowing the correct values for the image names in the manifest add commands, where I used the image hashes in this example. I could of course set separate names for the image, e.g. with the platform in the name. But then I would have to remember to do that every time I create a new pipeline.

Instead, I decided to go one step further and check how painful it would be to turn my simple command-based steps into a Woodpecker plugin.

Building a Woodpecker plugin

And it turns out: It isn’t complicated at all. The docs for new Woodpecker plugins is rather short and sweet. Plugins need to be containerized, and they need to have their program set as the entrypoint in the image. And that’s it. Any options given in the step are forwarded to the step container via environment variables, so there’s nothing special to be done at all.

That was good news, as I was a bit afraid I would have to write some Go. But no, just pure bash was enough.

In the final result, my pipeline for the testing image will look like this:

when:
  - event: push
    path:
      - '.woodpecker/testing.yaml'
      - 'testing/*'

variables:
  - &alpine-version '3.22.1'
  - &image-version '0.2'
  - &buildah-config
    type: build
    context: testing/
    containerfile: testing/Containerfile
    build_args:
      alpine_ver: *alpine-version

steps:
  - name: build amd64 image
    image: harbor.example.com/homelab/woodpecker-plugin-buildah:latest
    settings:
      <<: *buildah-config
      platform: linux/amd64
    depends_on: []
    backend_options:
      kubernetes:
        nodeSelector:
          kubernetes.io/arch: "amd64"
    when:
      - evaluate: 'CI_COMMIT_BRANCH != CI_REPO_DEFAULT_BRANCH'
  - name: build arm64 image
    image: harbor.example.com/homelab/woodpecker-plugin-buildah:latest
    settings:
      <<: *buildah-config
      platform: linux/arm64
      type: build
    depends_on: []
    backend_options:
      kubernetes:
        nodeSelector:
          kubernetes.io/arch: "arm64"
    when:
      - evaluate: 'CI_COMMIT_BRANCH != CI_REPO_DEFAULT_BRANCH'
  - name: push image
    image: harbor.example.com/homelab/woodpecker-plugin-buildah:latest
    settings:
      type: push
      manifest_platforms:
        - "linux/arm64"
        - "linux/amd64"
      tags:
        - latest
        - 1.5
      repo: harbor.example.com/homelab/testing
      username: ci
      password:
        from_secret: container-registry
    depends_on: ["build amd64 image", "build arm64 image"]
    privileged: true
    when:
      - evaluate: 'CI_COMMIT_BRANCH != CI_REPO_DEFAULT_BRANCH'

When a Woodpecker plugin is launched, it gets all of the values under settings: handed in as environment variables. A normal key/value pair like type: push would appear as PLUGIN_TYPE="push" in the plugin’s container. Lists like the tags or manifest_platforms appear as comma-separated lists in, e.g. PLUGIN_TAGS="latest,1.5". Objects are a bit more complicated, and they’re handed over as JSON objects, e.g. PLUGIN_BUILD_ARGS='{"alpine_ver": "3.22.1"}''.

First, there is a bit of a preamble in the script, to check whether required config options have been set and Buildah is available:

DATA_ROOT="/woodpecker"

if ! command -v buildah; then
  echo "buildah not found, exiting."
  exit 1
fi

if [[ -z "${PLUGIN_TYPE}" ]]; then
  echo "PLGUIN_TYPE not set, exiting."
  exit 1
fi

Then, depending on the PLUGIN_TYPE variable, either the build or the push function is executed, while either builds the image for a single platform or combines multiple platforms into a single manifest and pushes it all to the given registry:

if [[ "${PLUGIN_TYPE}" == "build" ]]; then
  echo "Running build..."
  build || exit $?
elif [[ "${PLUGIN_TYPE}" == "push" ]]; then
  echo "Running push..."
  push || exit $?
else
  echo "Unknown type ${PLUGIN_TYPE}, exiting"
  exit 1
fi

exit 0

And here is the build function:

build() {
  if [[ -z "${PLUGIN_CONTEXT}" ]]; then
    echo "PLUGIN_CONTEXT not set, aborting."
    return 1
  fi

  if [[ -z "${PLUGIN_PLATFORM}" ]]; then
    echo "PLUGIN_PLATFORM not set, aborting."
    return 1
  fi

  if [[ -z "${PLUGIN_CONTAINERFILE}" ]]; then
    echo "PLUGIN_CONTAINERFILE not set, aborting."
    return 1
  fi

  if [[ -n "${PLUGIN_BUILD_ARGS}" ]]; then
    BUILD_ARGS=$(get_build_args "${PLUGIN_BUILD_ARGS}")
  fi

  command="buildah \
--root ${DATA_ROOT} \
build \
--storage-driver=vfs \
--platform ${PLUGIN_PLATFORM} \
-t ${PLUGIN_PLATFORM}:0.0 \
${BUILD_ARGS} \
-f ${PLUGIN_CONTAINERFILE} \
${PLUGIN_CONTEXT} \
"
  echo "Running command: ${command}"

  ${command}
  return $?
}

It again starts out with some checks to make sure the required variables are set. Then it runs the buildah build command as in the previous setup with the manual command. The one “special” thing I’m doing here is that I tag the new image with the PLUGIN_PLATFORM variable and the :0.0 version. The storage for the builders is entirely temporary, so I will never have multiple versions in the storage, and this allows me to make the names of the images predictable in the later push step. So at the end of the function’s run, I would have images linux/amd64:0.0 and linux/arm64:0.0 in the same storage.

Which then brings us to the push function:

push() {
  if [[ -z "${PLUGIN_REPO}" ]]; then
    echo "PLUGIN_REPO not set, aborting."
    return 1
  fi

  if [[ -z "${PLUGIN_TAGS}" ]]; then
    echo "PLUGIN_TAGS not set, aborting."
    return 1
  else
    TAGS=$(echo "${PLUGIN_TAGS}" | tr ',' ' ')
  fi

  if [[ -z "${PLUGIN_MANIFEST_PLATFORMS}" ]]; then
    echo "PLUGIN_MANIFEST_PLATFORMS not set, aborting."
    return 1
  else
    PLATFORMS=$(echo "${PLUGIN_MANIFEST_PLATFORMS}" | tr ',' ' ')
  fi

  if [[ -z "${PLUGIN_USERNAME}" ]]; then
    echo "PLUGIN_USERNAME not set, aborting."
    return 1
  fi

  if [[ -z "${PLUGIN_PASSWORD}" ]]; then
    echo "PLUGIN_PASSWORD not set, aborting."
    return 1
  fi

  echo "Logging in..."
  buildah login -p "${PLUGIN_PASSWORD}" -u "${PLUGIN_USERNAME}" "${PLUGIN_REPO}" || return 1
  echo "Creating manifest..."
  buildah --root "${DATA_ROOT}" --storage-driver=vfs manifest create newimage || return 1
  for plt in ${PLATFORMS}; do
    echo "Adding platform ${plt}..."
    buildah --root "${DATA_ROOT}" --storage-driver=vfs manifest add newimage "${plt}:0.0" || return 1
  done

  echo "Pushing to registry..."
  for tag in ${TAGS}; do
    buildah --root "${DATA_ROOT}" --storage-driver=vfs manifest push newimage docker://${PLUGIN_REPO}:${tag} || return 1
  done

  buildah logout "${PLUGIN_REPO}"

  return 0
}

Here I need to do some more things than in the build step. First is the login, which is done via buildah login. Something which slightly annoys me here is the fact that Buildah only seems to support either interactive input of the password, or providing it via a CLI flag, but not e.g. via an environment variable.

When the login succeeds, the code iterates over all platforms and adds the $PLATFORM:0.0 image to the new manifest. Once that’s all done, the resulting manifest containing all the required platform’s images is pushed to the repository given in the repo option for the plugin.

I prefer having a plugin like this, because Woodpecker’s “command form” steps cannot re-use Yaml anchors like I was able to do here, so there would have been a lot more repetition in the pipeline setups.

Performance

After I got the plugin working, I started migrating my existing image builds over to the new plugin. I started out with my Fluentd image, where I take the official Fluentd image and install a few additional plugins into it before deploying into my Kubernetes cluster. The image file looks like this:

ARG fluentd_ver

FROM fluent/fluentd:${fluentd_ver}

USER root

RUN ln -s /usr/bin/dpkg-split /usr/sbin/dpkg-split
RUN ln -s /usr/bin/dpkg-deb /usr/sbin/dpkg-deb
RUN ln -s /bin/rm /usr/sbin/rm
RUN ln -s /bin/tar /usr/sbin/tar

RUN buildDeps="sudo make gcc g++ libc-dev" apt-get update \
	&& apt-get install -y --no-install-recommends $buildDeps curl \
	&& gem install \
       fluent-plugin-grafana-loki \
       fluent-plugin-record-modifier \
	     fluent-plugin-multi-format-parser \
	     fluent-plugin-rewrite-tag-filter \
	     fluent-plugin-route \
	     fluent-plugin-http-healthcheck \
	     fluent-plugin-kv-parser \
	     fluent-plugin-parser-logfmt \
	&& gem sources --clear-all \
  && SUDO_FORCE_REMOVE=yes \
      apt-get purge -y --auto-remove \
                    -o APT::AutoRemove::RecommendsImportant=false \
                    $buildDeps \
  && rm -rf /var/lib/apt/lists/* \
  && rm -rf /tmp/* /var/tmp/* /usr/lib/ruby/gems/*/cache/*.gem

USER fluent

And that’s where I discovered that my performance wasn’t exactly up to snuff still:

A screenshot of Woodpecker's CI run UI. On the left, it shows the Fluentd build and its steps. The clone steps finishes in 15s, but the two build steps for amd64 and arm64 take 22:57 and 23:32 respectively. The final 'push image' steps takes 04:49 minutes and failed. To the right are some logs of the adm64 image build, showing the executed buildah command and the initial pull of the fluentd/fluentd:v1.19.0-debian-1.0 image. To the very right of the output, a relative timestamp shows that the first step after the image pull, 'USER root', happens 1087s after the start of the step's run. — The fluentd image build takes around 23 minutes, with the lion’s share of 1087s/18 minutes taken by the pull of the fluentd image.

So here is a problem: The Fluentd build takes over 23 minutes. That’s a lot, and from the logs it looks like the initial image pull of the official Fluentd image takes 18 minutes on its own. Even though not shown here, it’s a similar situation on the arm64 build. I checked my connection, and the image was pulled from my local Harbor pull-through cache, it was not just a case of DockerHub being slow.

The problem here again seems to be CephFS and/or the nature of container images on disk. Because for a long time, the Ceph cluster was adding 10k objects per 15s interval:

A screenshot of a Grafana time series plot. It shows the object count changes per Ceph pool. Of interest here is the CephFS bulk pool. Starting at about 10:32, it produces 10k new objects per 15s, and does so almost continuously until 10:54. — Objects added in a 15s interval to the pools of my Ceph cluster. Orange/top line is my CephFS storage pool.

In total, this single two-image build added about 180k objects to the cluster:

Another screenshot of a Grafana time series plot. This time it shows the number of objects in the entire cluster. It starts out at about 2.03 million objects. At around 10:32, it starts rising at a pretty consistent rate, until it hits its peak of about 2.20 million objects at around 10:54. Afterwards, it's stable for a little while at around 2.19 million, before it goes down steadily again to the previous 2.03 million in the span of just 10 minutes. — The CI run produced about 180k new objects in the Ceph storage cluster.

After seeing all of this, I decided that the current setup might not be ideal when it comes to storage. One thought I had was that both builds using the same --root parameter on the shared volume might be part of the problem, thinking that perhaps Buildah did some locking of the storage area? So I switched the different platform builds to different directories on the shared volume. That did work somewhat, reducing the duration down to about 15 minutes:

Another screenshot of the Woodpecker UI, showing the same Fluentd build as before. This time, the amd64 and arm64 image build steps only took 15:06 and 14:56 respectively. The push image step still failed. The relative timestamp on the right now shows that the 'USER root' step of the Dockerfile started after 659 seconds this time. — Still with a shared volume, but not with a shared directory on that volume, the builds take less time.

The builds go from about 24 minutes to only 15 minutes. The initial pull of the Fluentd image goes down to about 11 minutes, from the previous 18 minutes.

This still seemed pretty long, so I started to consider the creation of a new CephFS with the data pool on SSDs, to hopefully improve the performance. But then I had a thought: How about removing the parallelism entirely? If I were to not run the steps in parallel, I could use a Ceph RBD instead, which would likely already be faster. I also already have a StorageClass for SSD-backed RBDs in my cluster, so no additional config would be necessary. And finally, using a Ceph RBD instead of CephFS, I would be able to use the faster OverlayFS storage driver for Buildah.

So I did all of that, switched the StorageClass for Woodpecker’s pipeline volumes to my SSD RBD class, and then disabled parallelism for the steps. The results were rather impressive:

Another screenshot of the Woodpecker UI, this time showing the image build steps only taking 01:42 minutes and 01:53 minutes. The push image step is successful now as well, taking 02:07 minutes. To the right, the logs of the image pull for the Fluentd image are shown again. The pull now took only 18s. — Both builds done sequentially on an SSD-backed Ceph RBD are faster than the same builds done in parallel, but on a CephFS volume with the VFS storage driver.

The entire pipeline has run through in about six minutes. Less time than the previous setup needed just for pulling down the Fluentd image.

Final thoughts

Even with all the weird errors I had to fix and the wrong turns I took, this was fun, and the fact that I ended up without any parallelism was surprising. I really enjoyed working on this one.

There are still a few improvements to be made, and some things to dig into. One burning question I currently have is why the parallelized version, using the VFS storage driver running on a CephFS shared volume, was so much slower. Was it mostly the slower VFS storage driver? Or was it CephFS? And if it was CephFS, what was actually the bottleneck? Because I wasn’t able to find one, neither in IO utilization, nor network, nor CPU on any of the nodes involved. I checked both, the nodes running the Buildah Pods and the Ceph nodes, and none seemed to show any overloads in any resource. So I’m a bit stumped.

Then there’s also the fact that my Woodpecker steps still need to run in privileged mode. I don’t like that, but I wasn’t able to figure out exactly what to do to remove that requirement. From everything I’ve read, this should be possible with Buildah, but might need some additional configuration on the Kubernetes nodes. I will have to check this in the future.

But for now, finally back to working on setting up a Bookwyrm instance.

The current setup#

The problems#

Removing emulation from my image build pipelines#

Building a Woodpecker plugin#

Performance#

Final thoughts#