As I’ve mentioned in my last k8s migration post, I’m working on writing a Homelab backup operator for my Kubernetes cluster. And I’ve run into some RBAC/permission issues I can’t quite figure out. So let’s see whether writing about it helps. 🙂

First, a short overview of the plan. I’m using the kopf framework to build a Kubernetes operator. This operator’s main goal is to handle HomelabServiceBackup resources. These will contain a list of PersitentVolumeClaims and S3 buckets which need to be backed up. I intend for there to be one HomelabServiceBackup object for every service, located in the service’s Namespace.

At the same time, I started out with defining a HomelabBackupConfig resource. This will contain some configs which will be common among all service backups, things like the hostname of the S3 server to store the backups and the image to be used for the backup jobs. There will only ever be one instance of this custom resource, and it should always reside in the Namespace of the operator itself. At the same time, there should also only ever be one operator for the entire k8s cluster.

This all seemed sensible to me until this afternoon, which was when I had finally done all the yak-shaving all new projects need, creation of the repo, config of the CI for image generation and UTs and such things. And I finally had a container image I could run, with a very simple implementation:

import kopf
import logging

@kopf.on.create('homelabbackupconfigs')
def create_handler(spec, status, meta, **kwargs):
    logging.info(f"Create handler called with meta: {meta}")
    logging.info(f"Create handler called with spec: {spec}")
    logging.info(f"Create handler called with status: {status}")

@kopf.on.resume('homelabbackupconfigs')
def resume_handler(spec, status, meta, **kwargs):
    logging.info(f"Resume handler called with meta: {meta}")
    logging.info(f"Resume handler called with spec: {spec}")
    logging.info(f"Resume handler called with status: {status}")

@kopf.on.update('homelabbackupconfigs')
def update_handler(spec, status, meta, diff, **kwargs):
    logging.info(f"Update handler called with meta: {meta}")
    logging.info(f"Update handler called with spec: {spec}")
    logging.info(f"Update handler called with status: {status}")
    logging.info(f"Update handler called with diff: {diff}")

@kopf.on.delete('homelabbackupconfigs')
def delete_handler(spec, status, meta, **kwargs):
    logging.info(f"Delete handler called with meta: {meta}")
    logging.info(f"Delete handler called with spec: {spec}")
    logging.info(f"Delete handler called with status: {status}")

The intention for this was merely to get a feeling for what I was actually getting for each of the different events, and to play around with when each of these handlers would be called.

For the first deployment, I launched kopf with the -A flag, which means it will use the Kubernetes cluster APIs to watch every Namespace. As noted above, I want every Namespace to be watched, as every one of them might contain a HomelabServiceBackup object to take care of the backup for the service residing in the Namespace. But I started out with only the HomelabBackupConfig CRD defined, as that’s the first step in my implementation plan. The content of the CRD is not important for now, I will show them in a later post when I’ve actually got the implementation ready.

I also needed to provide proper RBAC for the deployment, as the operator needs access to the API server. My thoughts went like this: For now, I only need the HomelabBackupConfig, and I only need that in the same Namespace the operator is running in. So I created the following Role:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: hlbo-role
rules:
  - apiGroups: [""]
    resources: [events]
    verbs: [create]
  - apiGroups: ["mei-home.net"]
    resources:
      - homelabbackupconfigs
    verbs:
      - get
      - watch
      - list
      - patch
      - update
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: hlbo-role
  labels:
    homelab/part-of: hlbo
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: hlbo-role
subjects:
- kind: ServiceAccount
  name: hlbo-account
  namespace: backups

This produced a number of errors when trying to launch my rudimentary operator:

[2024-05-12 14:19:55,454] kopf._core.reactor.o [ERROR   ]
Watcher for homelabbackupconfigs.v1alpha1.mei-home.net@none has failed:
'homelabbackupconfigs.mei-home.net is forbidden: User "system:serviceaccount:backups:hlbo-account" cannot list resource "homelabbackupconfigs" in API group "mei-home.net" at the cluster scope'

Okay, this seems reasonably clear to me. I’ve only created a Role and done a RoleBinding for the backups Namespace, where the operator resides.

I also tried another variant. Instead of using -A to have kopf use the cluster API, one can provide --namespace=*. This tells kopf to use the namespaced API, but list all Namespaces and watch them all. Then, I allowed kopf to list all Namespaces. I kept only allowing it access to the HomelabBackupConfig in the backups Namespace, though. This results in a lot of errors when it tries to watch HomelabBackupConfigs in Namespaces other than backups, but the operator keeps running. So this might a “solution”.

I could also return to using -A and just configure everything in a ClusterRole. But that’s just too many permissions that the operator doesn’t need. And I need to grant it access to the Jobs API, and I don’t want to do that cluster-wide either.

And finally, the individual handlers don’t allow defining a Namespace to watch a specific resource in. The only config is the command line flag, and that applies for all resources and their handlers.

So it looks like I have to search for another framework, as kopf doesn’t seem to allow me to do things in the least-privilege way I want them done. 😔

If you’ve got a good idea or you think I’ve overlooked something, please feel free to ping me on the Fediverse.