Wherein I migrate my internal container registry to Harbor.

This is part 12 of my k8s migration series.

Let’s start by answering the obvious question: Why even have an internal container registry? For me, there are two reasons:

  1. Some place to put my own container images
  2. A cache for external images

Most of my internal images are slightly changed external images. A prime example is my Fluentd image. I’ve extended the official image with a couple of additional plugins. And I needed some place to store them.

My main reason for point 2) is to avoid waste. Why reach out to the Internet and put additional unnecessary load on somebody else’s infrastructure by pulling the same image 12 times? It makes a lot more sense to me to only do that once and then use an internal cache. A secondary reason was of course the introduction of the DockerHub rate limit. I tended to hit that pretty regularly, especially when I was working on my CI.

A tertiary reason is Deutsche Telekom. My ISP. A couple of years ago, they tended to regularly get into peering battles with their tier 1 peering partners, and consequently, you had some days where the entire US was connected down a 512 Kbps pipe. Or at least that was what it felt like. Pulling an image from DockerHub ran with, I kid you not, 5 Kbps. Those days seem to be over, but I still like to at least be able to use previously pulled images.

Finally, there might also be a speed advantage when pulling from a local cache instead of reaching out to the Internet. But for me, that was never really a consideration. I’ve got a 1 Gbps LAN, and most of my storage runs off of a Ceph cluster, with the Image cache running on my bulk storage HDDs. So there’s really not going to be that much gain.

In my Nomad cluster, I had set up two instances of Docker’s official registry. Hm, it is now called “distribution”? And seemingly under the CNCF? Ah:

Registry, the open source implementation for storing and distributing container images and other content, has been donated to the CNCF. Registry now goes under the name of Distribution, and the documentation has moved to…

From the official docs on the Docker page.

I chose registry back then because it looked like a pretty low powered solution. For a GUI, I used docker-registry-ui, which I can warmly recommend.

But I also pretty much ran it as an open registry, which bothered me a bit. Plus, I had looked a lot at Harbor, but always found that it sounded a bit too much oriented towards deployment in Kubernetes. And now that I’m finally running my own Kubernetes cluster, I decided to replace my two registry instances with a single Harbor instance.

Another reason for wanting to look at Harbor was that I think at some point, registry could only serve as a pull-through cache for DockerHub, but not for other registries, e.g. Quay.io. But if I read the docs right, it’s now possible to mirror other registries with it as well.

There are other alternatives as well. The first one, Artifactory, is out, because while I know that it would fulfill my needs, it is also what we use at work. And there is no great love lost between me and Artifactory. It will only get deployed in my Homelab over my dead, cold, decomposing body.

Then there’s Sonatype Nexus. But quite frankly: That always gave off pretty strong “We’re going to go source available within the week” vibes.

Finally, there’s Gitea and their relatively recently introduced package management feature, which also includes a container registry. The main reason I did not go with this one is that it currently doesn’t support pull-through caches, although there’s a feature request. In addition, I’m still a big fan of running apps which do one thing well, instead of everything somewhat decently. (He says, looking at his Nextcloud file sharing/note taking/calendar/contacts/bookmarks moloch 😅)

So Harbor it is. Let’s dig into it.

Harbor setup

To setup Harbor, I used the official Helm chart. It is perfectly workable, but has some quirks when it comes to secrets handling I will go into detail about later.

Here is my values.yaml file for the chart:

expose:
  type: ingress
  tls:
    enabled: true
    certSource: none
  ingress:
    hosts:
      core: harbor.example.com
    annotations:
      traefik.ingress.kubernetes.io/router.entrypoints: myentrypoint
    harbor:
      labels:
        homelab/part-of: harbor
externalURL: https://harbor.example.com
ipFamily:
  ipv6:
    enabled: false
persistence:
  enabled: false
  imageChartStorage:
    disableredirect: true
    type: s3
    s3:
      existingSecret: my-harbor-rgw-secret
      bucket: harbor-random-numbers-here
      regionendpoint: http://rook-ceph-rgw-myobjectstorename.my-rook-cluster-namespace.svc:80
      v4auth: true
      rootdirectory: /harbor
      encrypt: false
      secure: false
logLevel: info
existingSecretAdminPassword: my-admin-secret
existingSecretAdminPasswordKey: mySecretsKey
existingSecretSecretKey: my-harbor-secret-key-secret
portal:
  resources:
    requests:
      memory: 256Mi
      cpu: 100m
  podLabels:
    homelab/part-of: harbor
core:
  resources:
    requests:
      memory: 256Mi
      cpu: 100m
  podLabels:
    homelab/part-of: harbor
jobservice:
  jobLoggers:
    - database
  resources:
    requests:
      memory: 256Mi
      cpu: 100m
  podLabels:
    homelab/part-of: harbor
registry:
  registry:
    resources:
      requests:
        memory: 256Mi
        cpu: 100m
  controller:
    resources:
      requests:
        memory: 256Mi
        cpu: 100m
  podLabels:
    homelab/part-of: harbor
  credentials:
    username: my-harbor-registry-user
    existingSecret: my-harbor-registry-user-secret
trivy:
  enabled: false
database:
  type: external
  external:
    host: "harbor-pg-cluster-rw"
    port: 5432
    username: harbor
    coreDatabase: harbor
    existingSecret: harbor-pg-cluster-app
redis:
  type: external
  external:
    addr: redis.redis.svc.cluster.local:6379
metrics:
  enabled: false
  serviceMonitor:
    enabled: false

The above is only for completeness’ sake. Let’s go through the config bit-by-bit. The first part is the setup for external access:

expose:
  type: ingress
  tls:
    enabled: true
    certSource: none
  ingress:
    hosts:
      core: harbor.example.com
    annotations:
      traefik.ingress.kubernetes.io/router.entrypoints: myentrypoint
    harbor:
      labels:
        homelab/part-of: harbor
externalURL: https://harbor.example.com
ipFamily:
  ipv6:
    enabled: false

This uses my Traefik Ingress to provide external connectivity. I’m disabling IPv6 because I don’t have it set up in my Homelab. Please note the (perfectly normal!) spelling of externalURL. I spelled it wrong, and so all the pull commands which Harbor helpfully shows in the web UI had the default URL in it. One of those things which can really only be solved by staring very intently at the YAML for an extended period of time. 😅

persistence:
  enabled: false
  imageChartStorage:
    disableredirect: true
    type: s3
    s3:
      existingSecret: my-harbor-rgw-secret
      bucket: harbor-random-numbers-here
      regionendpoint: http://rook-ceph-rgw-myobjectstorename.my-rook-cluster-namespace.svc:80
      v4auth: true
      rootdirectory: /harbor
      encrypt: false
      secure: false

Next up is persistence. Harbor has two approaches here. The first one, which is the default that I’m not using here, is using PersistentVolumeClaims to store the data, like container images. The second one is using S3, as I’m doing here. I disable the registry’s redirect feature here. It would normally redirect any requests directly to the S3 storage. But access to my S3 storage is very limited outside the cluster. And with my relatively low levels of activity, I don’t need to reduce the load on Harbor’s registry by enabling it. I’m using my Ceph Rook based S3 setup here. Again for completeness’ sake, here is the manifest for creating the bucket:

apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
  name: harbor
spec:
  generateBucketName: harbor
  storageClassName: rgw-bulk

I will talk about the secrets setup later in a separate section.

Another important thing to configure when setting up storage without persistent volumes is the configuration of storage for the job logs from e.g. the automated security scans Harbor can conduct on the images:

jobservice:
  jobLoggers:
    - database
  resources:
    requests:
      memory: 256Mi
      cpu: 100m
  podLabels:
    homelab/part-of: harbor

The important part here is the jobservice.jobLoggers[0]=database setting, which configures the job service to write logs to the Postgres DB.

I’m also disabling all of this security scanning, by switching off trivy.enabled.

Next somewhat interesting thing is the database setup:

database:
  type: external
  external:
    host: "harbor-pg-cluster-rw"
    port: 5432
    username: harbor
    coreDatabase: harbor
    existingSecret: harbor-pg-cluster-app

To manage the database, I’m using my CloudNativePG setup. Here are some parts of the database config:

  resources:
    requests:
      memory: 200M
      cpu: 150m
  postgresql:
    parameters:
      max_connections: "200"
      shared_buffers: "50MB"
      effective_cache_size: "150MB"
      maintenance_work_mem: "12800kB"
      checkpoint_completion_target: "0.9"
      wal_buffers: "1536kB"
      default_statistics_target: "100"
      random_page_cost: "1.1"
      effective_io_concurrency: "300"
      work_mem: "128kB"
      huge_pages: "off"
      max_wal_size: "128MB"
      wal_keep_size: "512MB"
  storage:
    size: 1.5G
    storageClass: rbd-fast

I hope this is a good compromise between dumping a long piece of YAML into every post about an app which needs Postgres, and not showing the database setup at all.

Finally, I’m using my Redis instance for caching and disabling metrics explicitly, so when I get around to gathering all the app level metrics and making dashboards, I’ve got something to grep for in the Homelab repo. 😉

Issues with secrets

I had a couple of issues with the different secrets which Harbor needs. First, let’s start with the place where it’s doing it right, the admin credentials:

existingSecretAdminPassword: my-admin-secret
existingSecretAdminPasswordKey: mySecretsKey

The Helm chart doesn’t just allow setting the Secret to use, but also which key in that Secret contains the password. That’s how it should be done.

The credentials for the database were also okay, because the key the Helm chart expected, password, happens to also be the key where CloudNativePG stores the user password in the secret it creates with the credentials. What saddened me a bit is that I couldn’t set the host and port that way as well, because CNPG puts those into the keys of the Secret it creates as well.

But a lot more annoying were the S3 credentials. Rook creates a secret for every bucket, complete with the access key and the secret key, as well as the name of the bucket, which is created semi-randomly. It also provides the correct endpoint. It would have been nice if I could have handed the ConfigMap Rook creates over to the Helm chart. Instead, I hardcoded the values in the values.yaml, which means I would have to do some manual intervention if I ever have to recreate it all. For the credentials, I could at least provide the name of an existing Secret. But as per the values.yaml comments, the access key and the secret key need to be put into specific keys in the provided Secret. And those were not the standard key names you would expect, e.g. AccessKey and SecretKey. No, they have to be REGISTRY_STORAGE_S3_ACCESSKEY and REGISTRY_STORAGE_S3_SECRETKEY.

So what to do now? Manually extract the keys from Rook’s secret and write a new secret by hand? Luckily, no. The Fediverse came through, and somebody proposed to use external-secret’s Kubernetes provider. This provider allows me to automatically take a Kubernetes Secret, and create a new secret from it, with the same data in different keys. This is still a pretty roundabout way, but I decided that this is preferable to the other options, which would be writing a secret by hand or forking the Helm chart.

First, we need to define some RBAC objects for use by the SecretStore for the Kubernetes provider.

Here is the ServiceAccount:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: ext-secrets-harbor
  labels:
    homelab/part-of: harbor

Next, we need a Role for that ServiceAccount to use:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: ext-secrets-harbor-role
  labels:
    homelab/part-of: harbor
rules:
  - apiGroups: [""]
    resources:
      - secrets
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - authorization.k8s.io
    resources:
      - selfsubjectrulesreviews
    verbs:
      - create

This allows all accounts using the Role to view Secrets in the Namespace the Role is created in, which in this case is my Harbor Namespace.

Finally, we need a RoleBinding to bind the Role to the ServiceAccount:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    homelab/part-of: harbor
  name: ext-secrets-harbor
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: ext-secrets-harbor-role
subjects:
- kind: ServiceAccount
  name: ext-secrets-harbor
  namespace: harbor

Once all of that has been created, we can define the SecretStore:

apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: harbor-secrets-store
  labels:
    homelab/part-of: harbor
spec:
  provider:
    kubernetes:
      remoteNamespace: harbor
      auth:
        serviceAccount:
          name: ext-secrets-harbor
      server:
        caProvider:
          type: ConfigMap
          name: kube-root-ca.crt
          key: ca.crt

One fascinating thing I learned is that Kubernetes puts the CA certs for the kube-apiserver in every Namespace, under a ConfigMap called kube-root-ca.crt.

This SecretStore can then be used to take the Secret created by Rook for the S3 bucket and rewrite it to fit the expectations of the Harbor chart as follows:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: "harbor-s3-secret"
  labels:
    homelab/part-of: harbor
spec:
  secretStoreRef:
    name: harbor-secrets-store
    kind: SecretStore
  refreshInterval: "1h"
  target:
    creationPolicy: 'Owner'
  data:
    - secretKey: REGISTRY_STORAGE_S3_ACCESSKEY
      remoteRef:
        key: harbor
        property: AWS_ACCESS_KEY_ID
    - secretKey: REGISTRY_STORAGE_S3_SECRETKEY
      remoteRef:
        key: harbor
        property: AWS_SECRET_ACCESS_KEY

This will have external-secrets go to the kube-apiserver and get the AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID keys from the harbor Secret, which was previously created automatically by Rook through the ObjectBucketClaim I used to create the S3 bucket for Harbor.

And with these five simple manifests, I could use the Rook S3 Secret with the Harbor Helm chart. 😅

One last thing which tripped me during setup were the registry credentials. The values.yaml contains these comments on how to set up the credentials:

registry:
  credentials:
    username: "harbor_registry_user"
    password: "harbor_registry_password"
    # If using existingSecret, the key must be REGISTRY_PASSWD and REGISTRY_HTPASSWD
    existingSecret: ""
    # Login and password in htpasswd string format. Excludes `registry.credentials.username`  and `registry.credentials.password`. May come in handy when integrating with tools like argocd or flux. This allows the same line to be generated each time the template is rendered, instead of the `htpasswd` function from helm, which generates different lines each time because of the salt.
    # htpasswdString: $apr1$XLefHzeG$Xl4.s00sMSCCcMyJljSZb0 # example string
    htpasswdString: ""

What I did not initially get from that comment was that when using an existing Secret, both the clear text password and the htpasswd string are required. This initially put me into an amusing conundrum: I did not have a single host where I had htpasswd available. 😂 I ended up using the Apache container just to generate the htpasswd string:

docker run -it httpd htpasswd -n -B my-harbor-registry-user

I then put that string into the Secret verbatim and was finally able to start the Harbor instance.

Transferring my internal images to Harbor

The first step I took was to transfer all of my internal images over to Harbor, by adapting the CI jobs which create them and pointing them to Harbor.

I’ve currently got five internal images, most of them just copies of official images with some additions. I create them with Drone CI, which I will switch over to Woodpecker later as part of the migration.

The first step in transferring the images was to set up a user for the CI in Harbor. This can be done with the Harbor Terraform provider, but I did it manually for now. Then I created a “homelab” project for those Docker images.

For my image repository, which houses the Dockerfiles for most of my internal images, I have a .drone.jsonnet file which looks like this:

local alpine_ver = "3.19.1";

local Pipeline(img_name, version, pr, alpine=false, alpine_ver_int=alpine_ver) = {
  kind: "pipeline",
  name:
      if pr then
        "Build "+img_name
      else
        "Release "+img_name,
  platform: {
    arch: "arm64",
  },
  steps: [
    {
      name:
      if pr then
        "Build Image"
      else
        "Release Image",
      image: "thegeeklab/drone-docker-buildx",
      privileged: true,
      settings: {
        repo: "harbor.example.com/homelab/"+img_name,
        registry: "harbor.example.com",
        username: "myuser",
        password: {
          from_secret: "harbor-secret",
        },
        dockerfile: img_name+"/Dockerfile",
        context: img_name+"/",
        mirror: "https://harbor-mirror.example.com",
        debug: true,
        buildkit_config: 'debug = true\n[registry."docker.io"]\n  mirrors = ["harbor.example/dockerhub-cache"]\n[registry."quay.io"]\n  mirrors = ["harbor.example.com/quay.io-cache"]\n[registry."ghcr.io"]\n  mirrors = ["harbor.example.com/github-cache"]',
        tags: [version, "latest"],
        custom_dns: ["10.0.0.1"],
        build_args: std.prune([
          img_name+"_ver="+version,
          if alpine then
            "alpine_ver="+alpine_ver_int
        ]),
        platforms: [
          "linux/amd64",
          "linux/arm64",
        ],
        dry_run:
        if pr then
          true
        else
          false
      },
    }
  ],
  trigger:
    if pr then
    {
      event: {
        include: [
          "pull_request"
        ]
      }
    }
    else
    {
      branch: {
        include: [
          "master"
        ]
      },
      event: {
        exclude: [
          "pull_request"
        ]
      }
    }
};

local Image(img_name, version, alpine=false, alpine_ver_int=alpine_ver) = [
  Pipeline(img_name, version, true, alpine, alpine_ver_int),
  Pipeline(img_name, version, false, alpine, alpine_ver_int)
];

Image("gitea", "1.21.10")

This configuration uses buildkit via the drone-docker-buildx plugin, which is no longer actively developed. One of the reasons why I’m planning to migrate to Woodpecker. I’m creating images for both arm64 and amd64, as most of my Homelab consists of Raspberry Pis.

One snag I hit during this part of the setup was when I tried to switch the Fluentd image in my logging setup, already running on Kubernetes, over to Harbor. I got only pull failures, without any indication what was going wrong. It turned out that this was the first time my Kubernetes nodes were trying to access something running in my cluster behind the Traefik ingress at example.com. And I yet again had to adapt my NetworkPolicy for said Traefik Ingress. Looking at the Cilium monitoring, I saw the following whenever one of my k8s hosts tried to pull the image:

xx drop (Policy denied) flow 0x0 to endpoint 1868, ifindex 6, file bpf_lxc.c:2069, , identity remote-node->39413: 10.8.5.218:55064 -> 10.8.4.134:8000 tcp SYN
xx drop (Policy denied) flow 0x0 to endpoint 1868, ifindex 6, file bpf_lxc.c:2069, , identity remote-node->39413: 10.8.5.218:55064 -> 10.8.4.134:8000 tcp SYN
xx drop (Policy denied) flow 0x0 to endpoint 1868, ifindex 6, file bpf_lxc.c:2069, , identity remote-node->39413: 10.8.5.218:55064 -> 10.8.4.134:8000 tcp SYN

Here the endpoint with the 1868 identity is Traefik, and we can see that access from a remote-node identity is failing. This was due to the fact that while I allowed access from world to Traefik, world in Cilium only means all nodes outside the Kubernetes cluster. Cluster nodes, including the local host, need to be explicitly allowed. So I had to add the following to my Traefik NetworkPolicy:

ingress:
  - fromEntities:
    - cluster

cluster includes both, the local host and all other nodes in the cluster.

With that fixed, my homelab project was able to provide images to both, my Docker based Nomad cluster and my cri-o based Kubernetes cluster:

A screenshot of the repositories in the 'homelab' project. It shows five repositories: hn-backup, fluentd, hugo, gitea and taskd. They have from 2 to 5 artifacts and 2 to 19 pulls. Overall, the quota used is 1.43 GiB. The access level is shown as 'Public'

My homelab project with five images after a couple of days of usage.

Setting Harbor up as a pull-through cache

With the handling of my own images finished and working, the last step remaining is the setup of pull-through caches for some public image registries. I wanted to set up an internal mirror for the following registries:

In Harbor, each mirror needs to be set up as a separate project, and it needs to be accessed at “harbor.example.com/project-name”. This is an issue for Docker daemons, which I will go into detail about later.

Here is an example for setting up the quay.io cache. First, an endpoint needs to be defined:

A screenshot of the 'New Registry Endpoint' dialogue in Harbor. In the menu on the left, the entry 'Registries' is chosen, and then the button 'NEW ENDPOINT' was clicked. The dialogue itself has the 'Provider' dropdown filled with 'Quay'. The name is given as 'Quay.io Cache' and the 'Endpoint URL' as 'https://quay.io'.

Setting up an endpoint for quay.io

After the endpoint is defined, the project needs to be created:

A screenshot of the 'New Project' dialogue in Harbor. In the menu on the left, the entry 'Projects' is chosen, and then the button 'NEW PROJECT was clicked. In the dialogue, the project-name is 'quay-cache', with the 'Access Level' checkbox labeled 'Public' being checked. The project quota is left at the default '-1'. The 'Proxy Cache' toggle is enabled, and the previously shown 'Quay.io cache' is chosen in the dropdown.

Setting up an the mirror project for quay.io

After these steps are done, a mirror for quay.io will be available at https://harbor.example.com/quay-cache.

Here is a table of the configs for my current mirrors:

NameEndpoint URLProvider
dockerhub-cachehttps://hub.docker.comDocker Hub
github-cachehttps://ghcr.ioGithub GHCR
k8s-cachehttps://registry.k8s.ioDocker Registry
quay.io-cachehttps://quay.ioQuay

But there is an issue with Harbor’s subpath approach to projects/mirrors: Docker only supports the registry-mirror option. This will only be used for DockerHub images, not any other registry. And the main issue: It does not support paths in the mirror URL given. Docker always expects the registry at /. This obviously doesn’t work with Harbor’s domain/projectName/ scheme.

At the same time, cri-o does not suffer from this issue at all. It follows the OCI containers-registries spec. With this spec, and the containers-registries.conf file, it can be configured to rewrite pulls to any registry URL you like. I will explain this later, but let’s start with the more complicated Docker daemon case.

What does Docker actually do when pulling?

While trying to figure out how to solve the issue with Docker’s registry-mirror option, I found this blog post, which had an excellent idea: Just rewrite Docker’s requests to point them to the right Harbor URL. And it worked. 🙂

Let’s start by having a look at the HTTP requests Docker makes when issuing the following command:

docker pull postgres:10

As the command does not have a registry domain defined, Docker defaults to DockerHub. Let’s imagine the Docker daemon is configured with --registry-mirror https://harbor.example.com.

The first request Docker would try to make is this:

GET https://harbor.example.com/v2/

It would expect a 401 return code, and a www-authenticate header. This header looks something like this in the case of harbor:

www-authenticate: Bearer realm="https://harbor.example.com/service/token",service="harbor-registry"

Next, Docker tries to request a token:

https://harbor.example.com/service/token?scope=repository:library/postgres/pull&service=harbor-registry

Armed with that token, it would look for the manifest file for the posgres:10 image:

https://harbor.example.com/v2/library/postgres/manifests/10.0

This is where things start going wrong with harbor, because this request, send to harbor, would look for the library project, which does exist by default, but is not a DockerHub mirror.

My first attempt to solve this issue was pretty simplistic: I configured an additional route for the harbor-core service in my Traefik ingress, with an additional path rewrite to rewrite requests like /v2/library/postgres/manifests/10.0 to /v2/dockerhub-cache/library/postgres/manifests/10.0. It looked like this:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: harbor-docker-mirror
  annotations:
    external-dns.alpha.kubernetes.io/hostname: "harbor-mirror.example.com"
    external-dns.alpha.kubernetes.io/target: "ingress-k8s.example.com"
spec:
  entryPoints:
    - secureweb
  routes:
    - kind: Rule
      match: Host(`harbor-mirror.example.com`)
      middlewares:
        - name: project-rewrite
          namespace: harbor
      services:
        - kind: Service
          name: harbor-core
          namespace: harbor
          port: http-web
          scheme: http
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: project-rewrite
spec:
  replacePathRegex:
    regex: ^\/v2\/(.+)$
    replacement: /v2/dockerhub-cache/${1}

This worked somewhat. The initial request for /v2/ was rewritten. But then I did not see the /service/token request hit this new harbor-mirror domain at all. It went to the harbor domain instead. And that request worked successfully, Docker got a token from that endpoint. But: The request would have been for a token to access the /library/postgres repository. The next request then went through the harbor-mirror again, which meant the request was correctly rewritten:

/v2/dockerhub-cache/library/postgres/manifests/10.0

But Harbor would now return a 401, because the token fetched in the previous step was for /library/postgres/, while the request was now for /dockerhub-cache/library/postgres.

To fix this issue, I did not just need to rewrite the query parameter for the /service/token request, but also the one before that. Because the domain to contact for the /service/token request is taken from the www-authenticate header of the response from the initial /v2/ request. And Harbor would of course always answer with a fixed domain, the one from the externalURL parameter in the Helm chart. And that’s not the route with the rewrite.

So I had to do two additional things, in addition to rewriting paths accessing /v2/:

  1. Rewrite the www-authenticate header from the response to the initial /v2/ request to make the Realm point to the special mirror domain, not Harbor’s domain
  2. Rewrite the scope=repository: in the /service/token request to prefix it with the name of the DockerHub mirror project in Harbor

It turned out that Traefik wasn’t really well equipped for that. It can of course rewrite headers, but there’s no facility to work with regexes - I could only replace the entire www-authenticate header with a static value. And that seemed a bit too inflexible to me.

So instead, I decided to set up another Pod, running the Caddy webserver, and using it to do the rewrites. I decided to use Caddy instead of Nginx, as the blog post I linked above did, because I’ve already got another Caddy serving as a webserver for my Nextcloud setup, but currently don’t have any Nginx in my Homelab.

I kept the Caddy setup pretty simple. Here’s the Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: caddy-dockerhub-mirror
spec:
  replicas: 1
  selector:
    matchLabels:
      app: caddy
  template:
    metadata:
      labels:
        app: caddy
    automountServiceAccountToken: false
    spec:
      containers:
        - name: caddy
          image: caddy:2.7.6
          volumeMounts:
            - name: config
              mountPath: /etc/caddy/
              readOnly: true
          resources:
            requests:
              cpu: 100m
              memory: 100Mi
          ports:
            - name: caddy-http
              containerPort: 8080
              protocol: TCP
      volumes:
        - name: config
          configMap:
            name: caddy-mirror-conf

Then there’s also a service required:

apiVersion: v1
kind: Service
metadata:
  name: caddy-mirror
spec:
  type: ClusterIP
  selector:
    app: caddy
  ports:
    - name: caddy-http
      port: 8080
      targetPort: caddy-http
      protocol: TCP

And finally an IngressRoute for my Traefik ingress:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: harbor-docker-mirror
  annotations:
    external-dns.alpha.kubernetes.io/hostname: "harbor-mirror.example.com"
    external-dns.alpha.kubernetes.io/target: "ingress-k8s.example.com"
spec:
  entryPoints:
    - secureweb
  routes:
    - kind: Rule
      match: Host(`harbor-mirror.example.com`)
      services:
        - kind: Service
          name: caddy-mirror
          namespace: harbor
          port: caddy-http
          scheme: http

The really interesting part is the Caddy config:

apiVersion: v1
kind: ConfigMap
metadata:
  name: caddy-mirror-conf
data:
  Caddyfile: |
    {
      admin off
      auto_https off
      log {
        output stdout
        level INFO
      }
    }
    :8080 {
      log {
        output stdout
        format filter {
          wrap json
          fields {
            request>headers>Authorization delete
            request>headers>Cookie delete
          }
        }
      }
      @v2-subpath {
        path_regexp repo ^/v2/(.+)
      }

      map /service/token {query.scope} {new_scope} {
        ~(repository:)(.*) "${1}dockerhub-cache/${2}"
      }

      rewrite /service/token ?scope={new_scope}&service={query.service}

      header >Www-Authenticate harbor.example.com harbor-mirror.example.com

      rewrite @v2-subpath /v2/dockerhub-cache/{re.repo.1}

      reverse_proxy http://harbor-core.namespace-of-harbor.svc.cluster.local {
        header_up Host "harbor.example.com"
      }
    }

The first rewrite is for all requests which go to /v2/. Because I don’t want to append the dockerhub-cache/ to the URL for the initial Docker daemon request for /v2/, I went with the ^/v2/(.+) regex for the matcher:

@v2-subpath {
  path_regexp repo ^/v2/(.+)
}

rewrite @v2-subpath /v2/dockerhub-cache/{re.repo.1}

These two lines define a rewrite for all paths /v2/.+ to /v2/dockerhub-cache/..., so that any request going over this mirror automatically accesses the DockerHub mirror project on my Harbor instance.

The next line just replaces the canonical Harbor domain with the specific mirror domain in the www-authenticate header, so that the subsequent request for the token goes through the mirror as well, instead of directly going to Harbor:

header >Www-Authenticate harbor.example.com harbor-mirror.example.com

With this, the realm="https://harbor.example.com/service/token" part of the header is rewritten to realm="https://harbor-mirror.example.com/service/token".

Now, the request for the token also goes through the Caddy instance, and I can rewrite the repository in the request’s scope parameter:

map /service/token {query.scope} {new_scope} {
  ~(repository:)(.*) "${1}dockerhub-cache/${2}"
}
rewrite /service/token ?scope={new_scope}&service={query.service}

The map instruction matches only on requests to /service/token and maps only the scope query parameter, to a Caddy-internal variable new_scope, where I split the scope=repository:library/postgres:pull parameter, graft the necessary /dockerhub-cache/ prefix in front of the /library/postgres repository definition. With this, the token request is made for the correct repository and Harbor will accept requests for the image files accompanied by this token.

One note: I had also tried to rewrite the entire query part of the request in one go, but I hit a weird issue. When operating on the whole query as one, Caddy would urlencode more parts of the query, in particular the = sign in the scope and service parameters. And for some reason, Harbor did not like that. It would only spit out a token when the = signs were left as-is.

And with all of this combined, I could now set the registry-mirror option for my Docker agents to https://harbor-mirror.example.com, and Docker pulls worked as intended and used the dockerhub-cache mirror on my Harbor instance without issue. 🎉

Configuring Docker and cri-o

Onto the last step: Configuring the Docker daemons in my Nomad cluster and the cri-o daemons in my Kubernetes cluster to use the new Harbor mirrors.

As noted above, Docker only supports mirrors for the DockerHub, nothing else. So configuring those daemons is pretty simple, just adding this in the /etc/docker/daemon.json file:

{
    "registry-mirrors": [
        "https://harbor-mirror.example.com"
    ]
}

Luckily, registry-mirrors is one of the Docker config options which can be live-reloaded, so a pkill --signal SIGHUP dockerd is enough, no restarts of the daemon and running containers necessary.

The cri-o config is a bit more involved, but it does have the benefit of supporting mirrors for any external registry you like. Cri-o implements the containers-registries config files. These can also be reloaded by sending a pkill --signal SIGHUP crio to the daemon, without any restarts.

The mirror configs all have a similar format. As an example, the config for registry.k8s.io looks like this:

[[registry]]
prefix = "registry.k8s.io"
insecure = false
blocked = false
location = "registry.k8s.io"
[[registry.mirror]]
location = "harbor.example.com/k8s-cache"

I place that file into /etc/containers/registries.conf.d/k8s-mirror.conf, issue a SIGHUP, and cri-o will happily start pulling from the Harbor mirror whenever an image from the official k8s registry is required. And like Docker, it will pull from the original registry if the mirror is down.

And with that, I’ve got my container registry needs migrated fully to Kubernetes with Harbor. Especially the piece with the request rewrites to get a DockerHub mirror for Docker daemons going on Harbor was interesting to figure out and very satisfying to get working.