Wherein I migrate my Drone CI setup on Nomad to a Woodpecker CI setup on k8s.

This is part 16 of my k8s migration series.

Finally, another migration blog post! I’m still rather happy that I’m getting into it again. For several years now, I’ve been running a CI setup to automate a number of tasks related to some personal projects. CI stands for Continuous Integration, and Wikipedia says this about it:

Continuous integration (CI) is the practice of integrating source code changes frequently and ensuring that the integrated codebase is in a workable state.

I’m pretty intimately familiar with the concept on a rather large scale, as I’m working in a CI team at a large company.

In the Homelab, I’m using CI for a variety of use cases, ranging from the traditional automated test cases for software I’ve written to just a convenient automation for things like container image builds. I will go into details on a few of those use cases later on, when I describe how I’ve migrated some of my projects.

The basic principle of CI for me is: You push a commit to a Git repository, and a piece of software automatically launches a variety of test jobs. These can range from UT jobs, over automated linter runs up to automated deploys of the updated software.

From Drone CI to Woodpecker CI

Since I started running a CI, I’ve been using Drone CI. It’s a relatively simple CI system, compared to what one could build e.g. with Zuul, Jenkins and Gerrit.

Drone CI consists of two components, the Drone CI server providing web hooks for the Git Forge to call and launching the jobs, and agents, which take the jobs and run them. In my deployment on Nomad, I was using the drone-runner-docker. It mounts the host’s Docker socket into the agent and uses it to launch Docker containers for each step of the CI pipeline.

It has always worked well for me and mostly got out of my way. So I didn’t switch to Woodpecker CI because of features. There aren’t that many different features anyway, because Woodpecker is a community fork of Drone CI. Rather, Drone CI started to have quite a bad smell. What bothered me the most was that their release notes were basically empty and said things like “integrated UI updates”. Then there is whatever happens after they were bought by Harness. Then there’s the fact that the component which needs to mount your host’s Docker socket hasn’t been updated in over a year.

In contrast, Woodpecker is a community project and had a far nicer smell, so I decided that while I was at it, I would not just migrate Drone to k8s but also switch to Woodpecker.

One of the things I genuinely looked forward to was the backend. With the migration to k8s, I could finally make use of my entire cluster. With Drone’s Docker runner, I always had to reserve a lot of resources for the CI job execution on the nodes where the agents were launched. Now, with the Kubernetes backend, it doesn’t matter (much, more later) where the agents are running - the only thing they do is launching Pods to run each step of the pipeline, but where those are scheduled is left to Kubernetes.

I will go into more detail later, when talking about my CI job migrations, but let me still give a short example of what I’m actually talking about.

Here’s a slight variation of the example pipeline from the Woodpecker docs:

when:
  - event: push
    branch: master

steps:
  - name: build
    image: debian
    commands:
      - echo "This is the build step"
      - echo "binary-data-123" > executable
  - name: a-test-step
    image: golang:1.16
    commands:
      - echo "Testing ..."
      - ./executable

This pipeline tells Woodpecker that it should only be run when a Git push is done to the master branch of the repository. This file would be committed to the repository it’s used in, but there are also options to tell Woodpecker to listen on events for other repositories. So you could theoretically even have a separate “CI” repository with all the pipelines. But that’s generally not a good idea.

The pipeline itself will execute two separate steps, called “build” and “a-test-step”. The image: parameter defines which container image is executed, in this case Debian and the golang image. And then follows a list of commands to be run. In this case, they’re pretty nonsensical and will lead to failed pipelines, but it’s only here for demonstration purposes anyway. In the Woodpecker web UI, this is what the pipeline looks like:

A screenshot of the Woodpecker web UI. It is separated into two main areas. The left one shows an overview of the pipeline and its steps. At the top left, it shows that the pipeline was launched by a push from user mmeier. Below that follows the list of steps, showing in order: clone, build, a-test-step. Both clone and build have a green check mark next to them, while a-test-step has a red X. The a-test-step step is also highlighted. On the right side, a window header 'Step Logs' shows the logs from the a-test-step execution. It starts out echoing the string 'Testing ...', followed by '/bin/sh: 18: ./executable: Permission denied'.

Screenshot of my first Woodpecker CI pipeline execution.

Database deployment

To begin with, Woodpecker needs a bit of infrastructure set up, namely a Postgres database. Smaller deployments can also be run on SQLite, I’m using Postgres mostly out of habit.

As I’ve written about before, I’m using CloudNativePG for my Postgres DB needs. In the recent 1.25 release, CNPG introduced support for creating multiple databases in a single Cluster. But because I’ve already started with “one Cluster per app”, I decided to stay with that approach for the duration of the k8s migration and look into merging it all into one Cluster later.

Because I’ve written about it in detail before, here’s just the basic options for the CNPG Cluster CRD I’m using:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: woodpecker-pg-cluster
  labels:
    homelab/part-of: woodpecker
spec:
  instances: 2
  imageName: "ghcr.io/cloudnative-pg/postgresql:16.2-10"
  bootstrap:
    initdb:
      database: woodpecker
      owner: woodpecker
  resources:
    requests:
      memory: 200M
      cpu: 150m
  postgresql:
    parameters:
      max_connections: "200"
      shared_buffers: "50MB"
      effective_cache_size: "150MB"
      maintenance_work_mem: "12800kB"
      checkpoint_completion_target: "0.9"
      wal_buffers: "1536kB"
      default_statistics_target: "100"
      random_page_cost: "1.1"
      effective_io_concurrency: "300"
      work_mem: "128kB"
      huge_pages: "off"
      max_wal_size: "128MB"
      wal_keep_size: "512MB"
  storage:
    size: 1.5G
    storageClass: rbd-fast
  backup:
    barmanObjectStore:
      endpointURL: http://rook-ceph-rgw-rgw-bulk.rook-cluster.svc:80
      destinationPath: "s3://backup-cnpg/"
      s3Credentials:
        accessKeyId:
          name: rook-ceph-object-user-rgw-bulk-cnpg-backup-woodpecker
          key: AccessKey
        secretAccessKey:
          name: rook-ceph-object-user-rgw-bulk-cnpg-backup-woodpecker
          key: SecretKey
    retentionPolicy: "30d"
---
apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: woodpecker-pg-backup
spec:
  method: barmanObjectStore
  immediate: true
  schedule: "0 30 1 * * *"
  backupOwnerReference: self
  cluster:
    name: woodpecker-pg-cluster

As always, I’m configuring backups right away. For CNPG to work, the operator needs network access to the Postgres instance started up in the Woodpecker namespace, so a network policy is also needed:

apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "woodpecker-pg-cluster-allow-operator-ingress"
spec:
  endpointSelector:
    matchLabels:
      cnpg.io/cluster: woodpecker-pg-cluster
  ingress:
    - fromEndpoints:
      - matchLabels:
          io.kubernetes.pod.namespace: cnpg-operator
          app.kubernetes.io/name: cloudnative-pg

While we’re on the topic of network policies, here’s my generic deny-all policy I’m using in most namespaces:

apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "woodpecker-deny-all-ingress"
spec:
  endpointSelector: {}
  ingress:
    - fromEndpoints:
      - {}

This allows all intra-namespace access between Pods, but no ingress from any Pods in other namespaces.

And because Woodpecker provides a web UI, I also need to provide access to the server Pod from my Traefik ingress:

apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "woodpecker-traefik-access"
spec:
  endpointSelector:
    matchExpressions:
      - key: "app.kubernetes.io/name"
        operator: In
        values:
          - "server"
  ingress:
    - fromEndpoints:
      - matchLabels:
          homelab/ingress: "true"
          io.kubernetes.pod.namespace: traefik-ingress

Hm, writing all of this up I’m realizing that I completely forgot to write a post about some “standard things” I will be doing for most apps. I had planned to do that for the migration of my Audiobookshelf instance to k8s, but completely forgot to write any post about it at all. Will put it on the pile. 😄

Before getting to the Woodpecker Helm chart, we also need to do a bit of yak shaving with regards to the CNPG DB secrets. Helpfully, CNPG always creates a secret with the necessary credentials to access the database, in multiple formats. An example would look like this:

data:
  dbname: woodpecker
  host: woodpecker-pg-cluster-rw
  jdbc-uri: jdbc:postgresql://woodpecker-pg-cluster-rw.woodpecker:5432/woodpecker?password=1234&user=woodpecker
  password: 1234
  pgpass: woodpecker-pg-cluster-rw:5432:woodpecker:woodpecker:1234
  port: 5432
  uri: postgresql://woodpecker:1234@woodpecker-pg-cluster-rw.woodpecker:5432/woodpecker
  user: woodpecker
  username: woodpecker

I would love to be able to use the values from that Secret verbatim, specifically the uri property, to set the WOODPECKER_DATABASE_DATASOURCE variable from it. But sadly, the Woodpecker Helm chart is one of those which do allow Secrets to be used to set environment variables - but only via envFrom.secretRef. Which feeds the Secret’s keys in as env variables, but doesn’t allow to set specific env variables to specific keys from the secret, via env.valueFrom.secretKeyRef.

I think this should be a functionality every Helm chart provides, specifically for cases like this. I’ve got two tools which automatically create Secrets in my cluster, CNPG for DB credentials and configs, and Rook, which creates Secrets and ConfigMaps for S3 buckets and Ceph users created through its CRDs. But every tool/Helm chart seems to have their own ideas about the env variables certain things should be stored in. The S3 credential env vars in the case of Rook’s S3 buckets should work in most cases because they’re pretty standardized, but everything else is pretty much hit-and-miss.

And, with the env.valueFrom functionality for both Secrets and ConfigMaps, Kubernetes already provides the necessary utility to assign specific keys from them to specific env vars. A number of Helm charts just need to allow me to make use of that, instead of insisting on Secrets with a specific group of keys.

Anyway, in the case of Secrets, I’ve found a pretty roundabout way to achieve what I want, namely being able to use automatically created credentials. And I’m using my External Secrets deployment for this, more specifically the ability to configure a Kubernetes namespace as a SecretStore:

apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: secrets-store
  labels:
    homelab/part-of: woodpecker
spec:
  provider:
    kubernetes:
      remoteNamespace: woodpecker
      auth:
        serviceAccount:
          name: ext-secrets-woodpecker
      server:
        caProvider:
          type: ConfigMap
          name: kube-root-ca.crt
          key: ca.crt
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: ext-secrets-woodpecker
  labels:
    homelab/part-of: woodpecker
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: ext-secrets-woodpecker-role
  labels:
    homelab/part-of: woodpecker
rules:
  - apiGroups: [""]
    resources:
      - secrets
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - authorization.k8s.io
    resources:
      - selfsubjectrulesreviews
    verbs:
      - create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    homelab/part-of: woodpecker
  name: ext-secrets-woodpecker
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: ext-secrets-woodpecker-role
subjects:
- kind: ServiceAccount
  name: ext-secrets-woodpecker
  namespace: woodpecker

This SecretStore then allows me to use External Secret’s ExternalSecret templating to take the CNPG Secret created automatically and bring it into a format to make it usable with the Woodpecker Helm chart. I decided that I would use the envFrom.secretRef method to turn all of the Secret’s keys into env variables:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: "woodpecker-db-secret"
  labels:
    homelab/part-of: woodpecker
spec:
  secretStoreRef:
    name: secrets-store
    kind: SecretStore
  refreshInterval: "1h"
  target:
    creationPolicy: 'Owner'
  data:
    - secretKey: WOODPECKER_DATABASE_DATASOURCE
      remoteRef:
        key: woodpecker-pg-cluster-app
        property: uri

That ExternalSecret takes the uri key from the automatically created CNPG Secret and writes its content into a new Secret’s WOODPECKER_DATABASE_DATASOURCE key. And just like that, I have a Secret in the right format to use it with Woodpecker’s Helm chart.

After I implemented the above, I had another thought how I could do the same thing without taking the detour via ExternalSecret. The Helm chart does provide options to add extra volume mounts. Furthermore, Woodpecker has the WOODPECKER_DATABASE_DATASOURCE_FILE variable, which allows reading the connection string from a file. So I could have mounted the CNPG DB Secret as a volume and then provided the path to the file with the uri key in this variable. Sadly I found this a bit late, but I will keep this possibility in mind should I come across another Helm chart which lacks the possibility to assign arbitrary Secret keys to env variables.

Temporary StorageClass

Woodpecker needs some storage for every pipeline executed. That storage is shared between all steps and is used to clone the repository and share intermediate artifacts between steps.

With the Kubernetes backend, Woodpecker uses PersistentVolumeClaims, one per pipeline run. It also automatically cleans those up after the pipeline has run through. The issue for me is that in my Rook Ceph setup, the StorageClasses all have their reclaim policy set to Retain. This is mostly because I’m not the smartest guy under the sun, and there’s a real chance that I might accidentally remove a PVC with data I would really like to keep. But that’s a problem for these temporary PVCs, which are only relevant for the duration of a single pipeline run. Using my standard StorageClasses would mean ending up with a lot of unused PersistentVolumes.

So I had to create another StorageClass with the reclaim policy set to Delete:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: homelab-fs-temp
provisioner: rook-ceph.cephfs.csi.ceph.com
reclaimPolicy: Delete
parameters:
  clusterID: rook-cluster
  fsName: homelab-fs
  pool: homelab-fs-bulk
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: "{{ .Release.Namespace }}"
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: "{{ .Release.Namespace }}"
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: "{{ .Release.Namespace }}"

This uses CephFS as the provider, because I like those volumes to be RWX capable, which is not the case for RBD based volumes.

Using this StorageClass, the PersistentVolume is deleted when the PVC is deleted, freeing the space for the next pipeline run.

Gitea configuration

Because Woodpecker needs access to Gitea, there’s some configuration necessary as well, mainly related to the fact that Woodpecker doesn’t have its own authentication and instead relies on the forge it’s connected to.

To begin with, Woodpecker needs to be added as an OAuth2 application. This can be done by any user, under the https://gitea.example.com/user/settings/applications URL. The configuration is the same as for any other OAuth2 provider, Woodpecker needs a client ID and a client secret.

The application can be given any name, and the redirect URL has to be https://<your-woodpecker-url>/authorize:

A screenshot of Gitea's OAuth2 client app creation form. In the 'Application Name' field, it shows 'Woodpecker Blog Example', and in the 'Redirect URIs' field, it shows 'https://ci.example.com/authorize'. The 'Confidential Client' option is enabled.

Gitea’s OAuth2 creation form.

After clicking Create Application, Gitea creates the app and shows the necessary information:

A screenshot of Gitea's OAuth2 app information screen. It shows the randomly generated 'Client ID' and 'Client Secret' and allows changing the 'Application Name' and 'Redirect URIs' fields.

Gitea’s OAuth2 information page.

I then copied the Client ID and Client Secret fields into my Vault instance and provided them to Kubernetes with another ExternalSecret:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: "gitea-secret"
  labels:
    homelab/part-of: woodpecker
spec:
  secretStoreRef:
    name: hashi-vault-store
    kind: ClusterSecretStore
  refreshInterval: "1h"
  target:
    creationPolicy: 'Owner'
  data:
    - secretKey: WOODPECKER_GITEA_CLIENT
      remoteRef:
        key: secret/gitea-oauth
        property: clientid
    - secretKey: WOODPECKER_GITEA_SECRET
      remoteRef:
        key: secret/gitea-oauth
        property: clientSecret

That was all the Gitea config necessary. There’s going to be one more step when accessing Woodpecker for the first time. Because it uses OAuth2, it will redirect you to Gitea to log in, and Gitea will then need confirmation that Woodpecker can access your account info and repositories.

Deploying Woodpecker

For deploying Woodpecker itself, I’m using the official Helm chart. It’s split into two subcharts, one for the agents which run the pipelines and one for the server. Let’s start with the server part of the values.yaml:

server:
  enabled: true
  metrics:
    enabled: false
  env:
    WOODPECKER_OPEN: "false"
    WOODPECKER_HOST: 'https://ci.example.com'
    WOODPECKER_DISABLE_USER_AGENT_REGISTRATION: "true"
    WOODPECKER_DATABASE_DRIVER: "postgres"
    WOODPECKER_GITEA: "true"
    WOODPECKER_GITEA_URL: "https://gitea.example.com"
    WOODPECKER_PLUGINS_PRIVILEGED: "woodpeckerci/plugin-docker-buildx:latest-insecure"
  extraSecretNamesForEnvFrom:
    - gitea-secret
    - woodpecker-db-secret
  persistentVolume:
    enabled: true
    storageClass: rbd-bulk
  ingress:
    enabled: true
    annotations:
      traefik.ingress.kubernetes.io/router.entrypoints: secureweb
    hosts:
      - host: ci.example.com
        paths:
          - path: /
  resources:
    requests:
      cpu: 100m
    limits:
      memory: 128Mi

As I do so often, I explicitly set metrics.enabled to false, so that later I can go through my Homelab repo and slowly enable metrics for the apps I’m interested in, just by grepping for metrics.

Woodpecker is entirely configured through environment variables. I’ve configured those which don’t contain secrets right in the values.yaml, and the secrets are added via the extraSecretNamesForEnvFrom list. Those are the Gitea OAuth2 and CNPG DB Secrets. The server itself also needs some storage space, which I put on my bulk storage pool with the persistentVolume option. I’m also configuring the Ingress and resources.

A short comment on the resources: Make sure that you know what you’re doing. 😅 I initially had the cpu: 100m resource set under limits accidentally. And then I was wondering yesterday why the Woodpecker server was restarted so often due to failed liveness probes. Turns out that the 100m is not enough CPU when the Pod happens to run on a Pi 4 and I’m also clicking around in the Web UI. The liveness probe then doesn’t get a timely answer and starts failing, ultimately restarting the Pod.

The second part of a Woodpecker deployment are the agents. Those are the part of Woodpecker that runs the actual pipelines, launching the containers for each step. Woodpecker supports multiple backends. The first one is the traditional Docker backend, which needs the agent to have access to the Docker socket. That’s the config I’ve been running up to now with my Drone setup. The two biggest downsides for me were the fact that a piece of software explicitly intended to execute arbitrary code would have full access to the host’s Docker daemon. The second one was that the agent could only run pipelines on its own host, which meant that it couldn’t distribute the different steps in my entire Nomad cluster.

Now, with Woodpecker, I’m making use of the Kubernetes Backend. With this backend, the agents themselves only work as an interface to the k8s API, launching one Pod for each step and creating the PVC used as shared storage for all steps of a pipeline.

One quirk of the Kubernetes backend is that it adds a NodeSelector to the architecture of the agent which is launching the pipeline. So when the agent executing a pipeline happens to be an ARM64 machine, all Pods for that pipeline will also run on ARM64 machines. But this can be controlled for individual steps as well.

Here is the agent portion of the Woodpecker Helm values.yaml:

agent:
  enabled: true
  replicaCount: 2
  env:
    WOODPECKER_BACKEND: kubernetes
    WOODPECKER_MAX_WORKFLOWS: 2
    WOODPECKER_BACKEND_K8S_NAMESPACE: woodpecker
    WOODPECKER_BACKEND_K8S_VOLUME_SIZE: 10G
    WOODPECKER_BACKEND_K8S_STORAGE_CLASS: homelab-fs-temp
    WOODPECKER_BACKEND_K8S_STORAGE_RWX: "true"
  persistence:
    enabled: true
    storageClass: rbd-bulk
    accessModes:
      - ReadWriteOnce
  serviceAccount:
    create: true
    rbasc:
      create: true
  resources:
    requests:
      cpu: 100m
    limits:
      memory: 128Mi
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: "kubernetes.io/arch"
      whenUnsatisfiable: "DoNotSchedule"
      labelSelector:
        matchLabels:
          "app.kubernetes.io/name": agent
          "app.kubernetes.io/instance": woodpecker

Here I’m configuring two agents to be run, and one each on a different architecture. In my cluster, this leads to one agent running on AMD64 and one running on ARM64, through the topologySpreadConstraints. I’m also telling the agents which StorageClass to use, as I explained above I had to create a new one with retention disabled. I’m setting a default 10 GB size for the volume.

Before continuing with some CI pipeline configs, let’s have a short look at the Pods Woodpecker launches. I’ve captured the Pod for the following Woodpecker CI step:

  - name: build
    image: debian
    commands:
      - echo "This is the build step"
      - echo "binary-data-123" > executable
      - chmod u+x ./executable
      - sleep 120

It looks like this:

apiVersion: v1
kind: Pod
metadata:
  labels:
    step: build
  name: wp-01jhkac6pf4jyfywavjg6be5cq
  namespace: woodpecker
spec:
  containers:
  - command:
    - /bin/sh
    - -c
    - echo $CI_SCRIPT | base64 -d | /bin/sh -e
    env:
    - name: CI
      value: woodpecker
    - name: CI_COMMIT_AUTHOR
      value: mmeier
    - name: CI_COMMIT_AUTHOR_AVATAR
      value: https://gitea.example.com/avatars/d941e68cc8aa38efdee91c3e3c97159e
    - name: CI_COMMIT_AUTHOR_EMAIL
      value: mmeier@noreply.gitea.example.com
    - name: CI_COMMIT_BRANCH
      value: master
    - name: CI_COMMIT_MESSAGE
      value: |
        Add a sleep to inspect the Pod        
    - name: CI_COMMIT_REF
      value: refs/heads/master
    - name: CI_COMMIT_SHA
      value: 353b9f67102ba120ffe9284aa711eb87c2542573
    - name: CI_COMMIT_URL
      value: https://gitea.example.com/adm/ci-tests/commit/353b9f67102ba120ffe9284aa711eb87c2542573
    - name: CI_FORGE_TYPE
      value: gitea
    - name: CI_FORGE_URL
      value: https://gitea.example.com
    - name: CI_MACHINE
      value: woodpecker-agent-1
    - name: CI_PIPELINE_CREATED
      value: "1736888948"
    - name: CI_PIPELINE_EVENT
      value: push
    - name: CI_PIPELINE_FILES
      value: '[".woodpecker/my-first-workflow.yaml"]'
    - name: CI_PIPELINE_FINISHED
      value: "1736888960"
    - name: CI_PIPELINE_FORGE_URL
      value: https://gitea.example.com/adm/ci-tests/commit/353b9f67102ba120ffe9284aa711eb87c2542573
    - name: CI_PIPELINE_NUMBER
      value: "3"
    - name: CI_PIPELINE_PARENT
      value: "0"
    - name: CI_PIPELINE_STARTED
      value: "1736888951"
    - name: CI_PIPELINE_STATUS
      value: success
    - name: CI_PIPELINE_URL
      value: https://ci.example.com/repos/1/pipeline/3
    - name: CI_PREV_COMMIT_AUTHOR
      value: mmeier
    - name: CI_PREV_COMMIT_AUTHOR_AVATAR
      value: https://gitea.example.com/avatars/d941e68cc8aa38efdee91c3e3c97159e
    - name: CI_PREV_COMMIT_AUTHOR_EMAIL
      value: mmeier@noreply.gitea.example.com
    - name: CI_PREV_COMMIT_BRANCH
      value: master
    - name: CI_PREV_COMMIT_MESSAGE
      value: |
        Possibly fix permission error        
    - name: CI_PREV_COMMIT_REF
      value: refs/heads/master
    - name: CI_PREV_COMMIT_SHA
      value: b680ab9b9a7aa300d80a43bd389de0e57f767e4f
    - name: CI_PREV_COMMIT_URL
      value: https://gitea.example.com/adm/ci-tests/commit/b680ab9b9a7aa300d80a43bd389de0e57f767e4f
    - name: CI_PREV_PIPELINE_CREATED
      value: "1736800786"
    - name: CI_PREV_PIPELINE_EVENT
      value: push
    - name: CI_PREV_PIPELINE_FINISHED
      value: "1736800827"
    - name: CI_PREV_PIPELINE_FORGE_URL
      value: https://gitea.example.com/adm/ci-tests/commit/b680ab9b9a7aa300d80a43bd389de0e57f767e4f
    - name: CI_PREV_PIPELINE_NUMBER
      value: "2"
    - name: CI_PREV_PIPELINE_PARENT
      value: "0"
    - name: CI_PREV_PIPELINE_STARTED
      value: "1736800790"
    - name: CI_PREV_PIPELINE_STATUS
      value: failure
    - name: CI_PREV_PIPELINE_URL
      value: https://ci.example.com/repos/1/pipeline/2
    - name: CI_REPO
      value: adm/ci-tests
    - name: CI_REPO_CLONE_SSH_URL
      value: ssh://gituser@git.example.com:1234/adm/ci-tests.git
    - name: CI_REPO_CLONE_URL
      value: https://gitea.example.com/adm/ci-tests.git
    - name: CI_REPO_DEFAULT_BRANCH
      value: master
    - name: CI_REPO_NAME
      value: ci-tests
    - name: CI_REPO_OWNER
      value: adm
    - name: CI_REPO_PRIVATE
      value: "true"
    - name: CI_REPO_REMOTE_ID
      value: "94"
    - name: CI_REPO_SCM
      value: git
    - name: CI_REPO_TRUSTED
      value: "false"
    - name: CI_REPO_URL
      value: https://gitea.example.com/adm/ci-tests
    - name: CI_STEP_FINISHED
      value: "1736888960"
    - name: CI_STEP_NUMBER
      value: "0"
    - name: CI_STEP_STARTED
      value: "1736888951"
    - name: CI_STEP_STATUS
      value: success
    - name: CI_STEP_URL
      value: https://ci.example.com/repos/1/pipeline/3
    - name: CI_SYSTEM_HOST
      value: ci.example.com
    - name: CI_SYSTEM_NAME
      value: woodpecker
    - name: CI_SYSTEM_PLATFORM
      value: linux/amd64
    - name: CI_SYSTEM_URL
      value: https://ci.example.com
    - name: CI_SYSTEM_VERSION
      value: 2.8.1
    - name: CI_WORKFLOW_NAME
      value: my-first-workflow
    - name: CI_WORKFLOW_NUMBER
      value: "1"
    - name: CI_WORKSPACE
      value: /woodpecker/src/gitea.example.com/adm/ci-tests
    - name: HOME
      value: /root
    - name: CI_SCRIPT
      value: CmlmIFsgLW4gIiRDSV9ORVRSQ19NQUNISU5FIiBdOyB0aGVuCmNhdCA8PEVPRiA+ICRIT01FLy5uZXRyYwptYWNoaW5lICRDSV9ORVRSQ19NQUNISU5FCmxvZ2luICRDSV9ORVRSQ19VU0VSTkFNRQpwYXNzd29yZCAkQ0lfTkVUUkNfUEFTU1dPUkQKRU9GCmNobW9kIDA2MDAgJEhPTUUvLm5ldHJjCmZpCnVuc2V0IENJX05FVFJDX1VTRVJOQU1FCnVuc2V0IENJX05FVFJDX1BBU1NXT1JECnVuc2V0IENJX1NDUklQVAoKZWNobyArICdlY2hvICJUaGlzIGlzIHRoZSBidWlsZCBzdGVwIicKZWNobyAiVGhpcyBpcyB0aGUgYnVpbGQgc3RlcCIKCmVjaG8gKyAnZWNobyAiYmluYXJ5LWRhdGEtMTIzIiA+IGV4ZWN1dGFibGUnCmVjaG8gImJpbmFyeS1kYXRhLTEyMyIgPiBleGVjdXRhYmxlCgplY2hvICsgJ2NobW9kIHUreCAuL2V4ZWN1dGFibGUnCmNobW9kIHUreCAuL2V4ZWN1dGFibGUKCmVjaG8gKyAnc2xlZXAgMTIwJwpzbGVlcCAxMjAK
    - name: SHELL
      value: /bin/sh
    image: debian
    imagePullPolicy: Always
    name: wp-01jhkac6pf4jyfywavjg6be5cq
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /woodpecker
      name: wp-01jhkac6pf4jyfywavjasgpcwn-0-default
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-n75dj
      readOnly: true
    workingDir: /woodpecker/src/gitea.example.com/adm/ci-tests
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: regcred
  nodeName: sehith
  nodeSelector:
    kubernetes.io/arch: amd64
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: wp-01jhkac6pf4jyfywavjasgpcwn-0-default
    persistentVolumeClaim:
      claimName: wp-01jhkac6pf4jyfywavjasgpcwn-0-default
  - name: kube-api-access-n75dj
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace

There are a number of noteworthy things in here. First perhaps the handling of the script to execute for the job:

  - command:
    - /bin/sh
    - -c
    - echo $CI_SCRIPT | base64 -d | /bin/sh -e
    env:
    - name: CI_SCRIPT
      value: CmlmIFsgLW4gIiRDSV9ORVRSQ19NQUNISU5FIiBdOyB0aGVuCmNhdCA8PEVPRiA+ICRIT01FLy5uZXRyYwptYWNoaW5lICRDSV9ORVRSQ19NQUNISU5FCmxvZ2luICRDSV9ORVRSQ19VU0VSTkFNRQpwYXNzd29yZCAkQ0lfTkVUUkNfUEFTU1dPUkQKRU9GCmNobW9kIDA2MDAgJEhPTUUvLm5ldHJjCmZpCnVuc2V0IENJX05FVFJDX1VTRVJOQU1FCnVuc2V0IENJX05FVFJDX1BBU1NXT1JECnVuc2V0IENJX1NDUklQVAoKZWNobyArICdlY2hvICJUaGlzIGlzIHRoZSBidWlsZCBzdGVwIicKZWNobyAiVGhpcyBpcyB0aGUgYnVpbGQgc3RlcCIKCmVjaG8gKyAnZWNobyAiYmluYXJ5LWRhdGEtMTIzIiA+IGV4ZWN1dGFibGUnCmVjaG8gImJpbmFyeS1kYXRhLTEyMyIgPiBleGVjdXRhYmxlCgplY2hvICsgJ2NobW9kIHUreCAuL2V4ZWN1dGFibGUnCmNobW9kIHUreCAuL2V4ZWN1dGFibGUKCmVjaG8gKyAnc2xlZXAgMTIwJwpzbGVlcCAxMjAK
    - name: SHELL
      value: /bin/sh

Running the CI_SCRIPT content through base64 -d results in this shell script:

if [ -n "$CI_NETRC_MACHINE" ]; then
cat <<EOF > $HOME/.netrc
machine $CI_NETRC_MACHINE
login $CI_NETRC_USERNAME
password $CI_NETRC_PASSWORD
EOF
chmod 0600 $HOME/.netrc
fi
unset CI_NETRC_USERNAME
unset CI_NETRC_PASSWORD
unset CI_SCRIPT

echo + 'echo "This is the build step"'
echo "This is the build step"

echo + 'echo "binary-data-123" > executable'
echo "binary-data-123" > executable

echo + 'chmod u+x ./executable'
chmod u+x ./executable

echo + 'sleep 120'
sleep 120

This shows that the commands from the commands: list from the step object in the Woodpecker file is converted into a shell script by copying the commands into the script and adding an echo for each of them.

Looking at this and thinking about my own work on a large CI I’m sometimes wondering what we’d do without the base64 command. 😅

Another aspect of the setup is all of the available environment variables, supplying a lot of information not just on the commit currently being CI tested, but also the previous commit. Most of the CI_ variables also have equivalents prefixed with DRONE_, for backwards compatibility. I removed them in the output above to not make the snippet too long.

Finally there’s proof of what I said above about the agent’s architecture. This pipeline was run by the agent on my AMD64 node, resulting in the NodeSelector for AMD64 nodes:

  nodeName: sehith
  nodeSelector:
    kubernetes.io/arch: amd64

Also nice to see that the Pod was running on sehith, which isn’t the node the agent ran on, showing that the Pods are just submitted for scheduling to k8s, being able to run on any (AMD64 in this case) node.

Before ending the post, let’s have a look at some example CI configurations.

CI configurations

Each repository using Woodpecker needs to be enabled. This is done from Woodpecker’s web UI:

A screenshot of Woodpecker's repo enabling UI. IT shows a search field at the top and a list of repositories at the bottom. Some of them have a label saying 'Already enabled', while others have an 'Enable' button next to them.

Woodpecker’s repo addition UI.

When clicking the Enable button, Woodpecker will contact Gitea and add a webhook configuration for the repository. With that, Gitea will call the webhook with information about the event which triggered it and the state of the repository.

The Woodpecker configuration files for a specific repository are expected in the .woodpecker/ directory at the repository root by default.

Blog repo example

Here’s the configuration I’m using to build and publish this blog:

when:
  - event: push

steps:
  - name: Hugo Site Build
    image: "harbor.mei-home.net/homelab/hugo:0.125.4-r3"
    commands:
      - hugo
  - name: Missing alt text check
    image: python:3
    commands:
      - pip install lxml beautifulsoup4
      - python3 scripts/alt_text.py ./public/posts/
  - name: Hugo Site Upload
    image: "harbor.mei-home.net/homelab/hugo:latest"
    environment:
      AWS_ACCESS_KEY_ID:
        from_secret: access-key
      AWS_SECRET_ACCESS_KEY:
        from_secret: secret-key
    commands:
      - s3cmd -c /s3cmd.conf sync -r --delete-removed --delete-after --no-mime-magic ./public/ s3://blog/

To start with, the page needs to be build, using Hugo in an image I build myself, based on Alpine with a couple of tools installed. Then I’m running a short Python script which uses beautifulsoup4 to scan through the generated HTML and make sure that each image has alt text, and that there’s actually something in that alt text. Finally, I push the generated site up to an S3 bucket in my Ceph cluster from where it is served.

The when: at the beginning is important, it determines under which conditions the pipeline is executed. This can be configured for specific branches or certain events, like a push or an update of a pull request. The different conditions can also be combined. In addition to configuring conditions on the entire pipeline, they can also be configured just on certain steps, as we will see later.

One thing I find a little bit lacking at the moment, specifically for the Kubernetes use case, is the secrets management. It’s currently only possible via the web UI or the CLI. There’s no way to provide specific Kubernetes Secrets to certain steps in a certain pipeline. But there is an open issue to implement support for Kubernetes Secrets on Github. Until that is implemented, the UI needs to be used. It looks like this:

A screenshot of Woodpecker's secret configuration UI. It contains a field for a name for the secret and values. In addition, it can be made available only for certain images used in steps. Furthermore, the secret can be restricted to certain events triggering a pipeline run, e.g. only Pushes or Tags or Pull Requests.

Woodpecker’s secret addition UI.

Secrets can be configured for specific repositories, specific orgs where the forge supports them and for all pipelines.

When looking at a specific repository, all of the pipelines which ran for it are listed:

A screenshot of the pipeline list for the mmeier/blog repository. It shows for pipelines. The first one is called 'CI: Migrate to Woodpecker' and the most recent one 'Publish post on hl-backup-operator deployment'. All of them show that they were pushed directly to the Master branch and took about 1 - 2 minutes each. Each pipeline also shows the Git SHA1 of the commit it tested.

Woodpecker’s pipeline list for my blog repo.

This gives a nice overview of the pipelines which ran recently, here with the example of my blog repository, including the most recent run for publishing the post on the backup operator deployment.

Clicking on one of the pipeline runs then shows the overview of that pipeline’s steps and the step logs:

A screenshot of the pipeline run publishing the hl-backup-operator blog article. At the top right is the subject line of the commit message triggering the pipeline again, 'Publish post on hl-backup-operator deployment'. On the left is a list of the steps, showing 'clone', 'Hugo Site Build', 'Missing alt text check', 'Hugo Site Upload'. The 'Hugo Site Build' step is highlighted, and the logs for that step, showing Hugo's build output, are shown on the right side.

Woodpecker’s pipeline view.

This pipeline is not very complex and runs through in about two minutes. So let’s have a look at another pipeline with a bit more complexity.

Docker repo example

Another repository where I’m making quite some use of CI is my Docker repository. In that repo, I’ve got a couple of Dockerfiles for cases where I’m adding something to upstream images or building my own where no upstream container is available.

This repository’s CI is a bit more complicated mostly because it does the same thing for multiple different Docker files, and because it needs to do different things for pull requests and commits pushed to the Master branch.

And that’s where the problems begin, at least to a certain extend. As I’ve shown above, you can provide a when config to tell Woodpecker under which conditions to run the pipeline. And if you leave that out completely, you don’t end up with the pipeline being run for all commits. No. You end up with the pipeline being run twice for some commits.

Consider, for example, this configuration:

steps:
  - name: build image
    image: woodpeckerci/plugin-docker-buildx:latest-insecure
    settings:
      <<: *dockerx-config
      dry-run: true
    when:
      - event: pull_request

Ignore the config under settings, and concentrate on the fact that there’s no when config on the pipeline level, only on the step level. And there’s only one step, that’s supposed to run on pull requests. The result of this config is that two pipelines will be started - including Pod launches, PVC creation and so on:

A screenshot of Woodpecker showing two pipelines. One failed, one successful. Both show being run for the same commit. One shows that it was launched by a push event to the 'woodpecker-ci' branch and the other that it was pushed to pull request 77.

The two pipelines started for the previous configuration, both for the same commit.

The pipeline #1 was launched for the “push” event to the woodpecker-ci branch, and the other for the update of the pull request that push belonged to. The push event pipeline only launched the clone step, while the pull request pipeline launched the build image step and the clone step.

The root cause for this behavior is that Gitea always triggers the webhook for all fitting events, one for each event. And consequently, Woodpecker then launches one pipeline for each event.

A similar effect can be observed when combining both, pull requests and push events in one when clause on the pipeline level.

Now, you might be saying: Okay, then just configure the triggers only on the steps, not on the entire pipeline. But that also doesn’t really work. Without a when clause, as shown above, two pipelines are always started for commits in pull requests. And even though one of the pipelines won’t do much, it would still do something. In my case, it would launch a Pod for the clone step and also create a PVC and clone the repo - for nothing.

The next idea I came up with: Okay, then let’s set the pipeline’s when to trigger on push events, because that would trigger the pipeline for both - pushes to branches like master and pushes to pull requests. And then just add when clauses to each step with either the pull request or push event, depending on when it is supposed to run. But that also won’t work - any given pipeline only ever sees one event. If I trigger on push events on the pipeline level, the steps triggering on the pull request event will never trigger.

I finally figured out a way to do this. I always trigger the pipeline on push events. And then I use Woodpecker’s evaluate clause to trigger only on certain branches.

With all of that said, this is what the config is looking like for the pipeline which builds my Hugo container:

when:
  - event: push
    path:
      - '.woodpecker/hugo.yaml'
      - 'hugo/*'

variables:
  - &alpine-version '3.21.2'
  - &app-version '0.139.0-r0'
  - &dockerx-config
    debug: true
    repo: harbor.example.com/homelab/hugo
    registry: harbor.example.com
    username: ci
    password:
      from_secret: container-registry
    dockerfile: hugo/Dockerfile
    context: hugo/
    mirror: https://harbor-mirror.example.com
    buildkit_config: |
      debug = true
      [registry."docker.io"]
        mirrors = ["harbor.example.com/dockerhub-cache"]
      [registry."quay.io"]
        mirrors = ["harbor.example.com/quay.io-cache"]
      [registry."ghcr.io"]
        mirrors = ["harbor.example.com/github-cache"]      
    tags:
      - latest
      - *app-version
    build_args:
      hugo_ver: *app-version
      alpine_ver: *alpine-version
    platforms:
      - "linux/amd64"
      - "linux/arm64"

steps:
  - name: build image
    image: woodpeckerci/plugin-docker-buildx:latest-insecure
    settings:
      <<: *dockerx-config
      dry-run: true
    when:
      - evaluate: 'CI_COMMIT_BRANCH != CI_REPO_DEFAULT_BRANCH'
  - name: release image
    image: woodpeckerci/plugin-docker-buildx:latest-insecure
    settings:
      <<: *dockerx-config
      dry-run: false
    when:
      - evaluate: 'CI_COMMIT_BRANCH == CI_REPO_DEFAULT_BRANCH'

First, what does this pipeline do for pull requests and main branch pushes? For pull requests, it uses the buildx plugin to build a Docker container from the directory hugo/Dockerfile in the repository. That’s what happens in the build image step. Notably, no push to a registry happens here. In the case of pushes to the repo’s default branch, which is provided by Gitea in the webhook call, the same plugin and build is used, but this time the newly build images are pushed to my Harbor registry. For more details on that setup, see this post.

In the when clause for the pipeline, as I’ve explained above, I’m triggering on the push event, to circumvent the problem with multiple pipelines being executed for commits in pull requests. In addition, I’m also making use of path-based triggers. Because I’ve got multiple container images defined in one repository, I’d like to avoid running the builds for images which haven’t changed unnecessarily. That’s done by triggering the pipeline only on changes in its own config file and changes in the hugo/ directory. So if the Hugo image definition and CI config haven’t changed, the pipeline won’t be triggered.

As you can see I’m building images for both, AMD64 and ARM64. And before I close this section, I have to tell a slightly embarrassing story. I initially tried to run two pipelines - one for each architecture, so that they could both run in parallel on different nodes fitting their architecture. This would avoid the cost of emulating a foreign architecture, making the builds faster overall. This seemed like an excellent idea. And it worked really, really well. The pipelines got a couple of minutes faster. Until I had a look at my Harbor instance. And as some of you might have already figured out, I found that of course there was not one tag with images for both architectures. Instead, the tag contained whatever pipeline finished last. Because of course, two different Docker pushes override each other, instead of doing a merge. This is a problem I need to have another look at later. Someone on the Fediverse already showed me that there is a multistep way to do this manually.

Another point that I still need to improve is image caching. I think that there’s still some potential for optimization in my setup. But that’s also something for after the k8s migration is done.

Before I close out this section, I would like to point out a pretty nice feature Woodpecker has: A linter for the pipeline definitions, for example like this:

A screenshot of Woodpecker's linter output. It shows a number of issues with the pipeline config. For example that steps.1.environment and steps.2.environment are both of an invalid type. It expected an array, but got a null. And for all of the steps it outputs a 'bad_habit' warning about the fact that neither the pipeline nor any of the steps have a 'when' clause.

Example output of Woodpecker’s config linter.

The configuration spitting out those warnings is this one for my blog:

steps:
  - name: submodules
    image: alpine/git
    commands:
    - git submodule update --init --recursive
  - name: Hugo Site Build
    image: "harbor.example.com/homelab/hugo:0.125.4-r3"
    environment:
    commands:
      - hugo
  - name: Missing alt text check
    image: python:3
    environment:
    commands:
      - pip install lxml beautifulsoup4
      - python3 scripts/alt_text.py ./public/posts/
  - name: Hugo Site Upload
    image: "harbor.example.com/homelab/hugo:latest"
    environment:
      AWS_ACCESS_KEY_ID:
        from_secret: access-key
      AWS_SECRET_ACCESS_KEY:
        from_secret: secret-key
    commands:
      - s3cmd -c /s3cmd.conf sync -r --delete-removed --delete-after --no-mime-magic ./public/ s3://blog/

The main issues are the empty environment keys, as well as the fact that I did not set any when clause.

Conclusion

And that’s it. Again a pretty long one, but I had never written about my CI setup and wanted to take this chance to do so, also because I had gotten some questions on the Fediverse from people what a CI actually does, and some interest in what Woodpecker looks like.

Oh and also, I just have a propensity for long-winded writing. 😅

With this post, the Woodpecker/CI migration to k8s is done, and I’m quite happy with it. Especially the fact that my CI pipeline steps now get distributed over the entire cluster instead of just running on the nodes with the agents.

For the next step I will likely take my Gitea instance and migrate it over, but as this blog post took longer than I thought it would, it might have to wait until next weekend.