Wherein I migrate my Jellyfin instance to the k8s cluster.

This is part 19 of my k8s migration series.

I’m running a Jellyfin instance in my Homelab to play movies and TV shows. I don’t have a very fancy setup, no re-encoding or anything like that. I’m just using Direct Play, as I’m only watching things on my desktop computer.

Jellyfin doesn’t have any external dependencies at all, so there’s only the Jellyfin Pod itself to be configured. It also doesn’t have a proper configuration file. Instead, it’s configured through the web UI and a couple of command line options. For that reason, I won’t have any Secrets or ConfigMaps. Instead I’ve just got a PVC with the config and some space for Jellyfin’s cache and another CephFS volume for the media collection.

Said media collection volume will be the main focus of this post, because everything else about the setup follows my standard k8s app setup pretty closely. I had originally planned to also dive a bit (okay, a lot 😅) into the metrics of the copy operation, but that rather quickly turned into a rabbit hole all its own, and so I decided to declare the beginning of operation “articles, not tomes” and split it out into another post that will follow shortly after this one.

Setting up the media volume

For my media volume, I had been using a CephFS volume in the Nomad job setup. I’ve had two reasons for this:

  1. I need to mount the volume twice and access it from two places: The Jellyfin job, and my main desktop
  2. Having “unlimited” space

Ceph RBD volumes were out of the question, because those always need to have a size set. They can’t just grow over the entire space available in their Ceph pool. CephFS volumes are different, though. By default, they don’t have any size restriction and can use the entire data pool of the CephFS they’ve been created on. This allows me to not have to worry about whether I need to extend the size at some point. At the same time, I also regularly copy new files onto the disk when expanding my media collection. This happens from my desktop. So I also need to have the ability to mount the volume on two machines at the same time, and write to it at the same time too.

These two points make CephFS the perfect fit for the media volume. But it left me with a problem: I needed a k8s PVC to mount into the Jellyfin Pod. But by default, PVCs always have to have a capacity set. In my initial tests, I tried just removing the size in the manifest for a test PVC, but k8s rejected it when I tried to apply it. The same thing happened when I instead set the size to 0.

So back to the drawing board it was. Luckily for me, @beyondwatts pointed me to static PVCs, which can be used to make manually created CephFS and RBD volumes available as PVCs in Kubernetes. This seems to be a feature of the Ceph CSI. The documentation for the feature can be found here.

I created my new media volume (technically a CephFS subvolume) with the following Ceph commands:

ceph fs subvolumegroup create homelab-fs static-pvcs
ceph fs subvolume create homelab-fs media static-pvcs

After creation, the ceph fs subvolume info homelab-fs media static-pvcs output looks like this:

{
    "atime": "2025-02-11 22:46:35",
    "bytes_pcent": "undefined",
    "bytes_quota": "infinite",
    "bytes_used": 0,
    "created_at": "2025-02-11 22:46:35",
    "ctime": "2025-02-11 22:46:35",
    "data_pool": "homelab-fs-bulk",
    "features": [
        "snapshot-clone",
        "snapshot-autoprotect",
        "snapshot-retention"
    ],
    "flavor": 2,
    "gid": 0,
    "mode": 16877,
    "mon_addrs": [
        "300.300.300.1:6789",
        "300.300.300.2:6789",
        "300.300.300.3:6789"
    ],
    "mtime": "2025-02-11 22:46:35",
    "path": "/volumes/static-pvcs/media/9a1f1581-6749-4146-a2aa-251fe2b58eca",
    "pool_namespace": "",
    "state": "complete",
    "type": "subvolume",
    "uid": 0
}

Note especially the bytes_quota: infinite part, which was what I was after. Next, I created the PersistentVolume for it:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: jellyfin-media
spec:
  accessModes:
    - ReadWriteMany
  capacity:
    storage: 1Gi
  csi:
    driver: rook-ceph.cephfs.csi.ceph.com
    controllerExpandSecretRef:
      name: rook-csi-cephfs-provisioner
      namespace: rook-cluster
    nodeStageSecretRef:
      name: rook-csi-cephfs-node
      namespace: rook-cluster
    volumeAttributes:
      "fsName": "homelab-fs"
      "clusterID": "rook-cluster"
      "staticVolume": "true"
      "rootPath": /volumes/static-pvcs/media/9a1f1581-6749-4146-a2aa-251fe2b58eca
    volumeHandle: jellyfin-media
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Filesystem

I mostly copied this from another CephFS volume I already had as scratch space for my backup setup. Important to note here is the spec.csi.volumeAttributes.staticVolume: true entry as well as the rootPath. The value for the root path can be found with the following command:

ceph fs subvolume getpath homelab-fs media static-pvcs

The PersistentVolumeClaim then looks like this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: jellyfin-media
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: ""
  volumeMode: Filesystem
  volumeName: jellyfin-media

Because it’s a CephFS subvolume, I could use the ReadWriteMany access mode.

But when trying to launch a Pod using the PVC, I got this error message initially:

MountVolume.MountDevice failed for volume "jellyfin-media" : rpc error: code = Internal desc = failed to get user credentials from node stage secrets: missing ID field 'userID' in secrets

This showed up in the Events of the Pod. The issue is mentioned in the Rook Docs. And it needs to be solved by manually creating another Secret. I’m not sure why the Ceph CSI driver doesn’t automatically create the Secret, as it’s just a copy of the rook-csi-cephfs-node secret with different names for the data keys.

I did the copy by first fetching the rook-csi-cephfs-node secret:

kubectl get -n rook-cluster secrets rook-csi-cephfs-node -o yaml > csi-secret.yaml

From that csi-secret.yaml I then removed all of the runtime information added by Kubernetes and then renamed the keys like this:

  • adminID -> userID
  • adminKey -> userKey

After that, I applied the new Secret to the cluster and then changed the spec.csi.nodeStageSecretRef.name property of the PersistentVolume to the newly created Secret. After that, the Pod was able to mount the CephFS static volume without issue. What I’m still wondering about is why these static PVCs need this special handling, even though CephFS PVCs created dynamically don’t.

The last step of the preparation was to make sure that I could also mount the CephFS subvolume on my desktop machine. This, quite honestly, involved a bit of silliness. In my current configuration, I just had the name option set for the mount, giving the Ceph user name to use for authentication. This then automatically takes the /etc/ceph/ceph.conf file to get the MON daemon IPs for initial cluster contact and the ceph.client.<username>.keyring file from the same directory. I couldn’t reuse the same approach, because I’ve got other mounts from the baremetal cluster I need to keep for now.

But as per the ceph.mount man page, there is a secretfile option. In my naivete, I thought that this file takes the path to a keyring file. Which would make sense. Because the keyring files are the way Ceph credentials are provided everywhere else. But no. The secretfile option expects a file which contains only the key, and nothing else. If you provide it with a full keyring file, the mount command will output an error like this:

secret is not valid base64: Invalid argument.
adding ceph secret key to kernel failed: Invalid argument
couldn't append secret option: -22

With that finally figured out, I created the Ceph config file for the Rook cluster with this command:

ceph config generate-minimal-conf

Then I was able to mount the subvolume with this command:

mount -t ceph :/volumes/static-pvcs/media/9a1f1581-6749-4146-a2aa-251fe2b58eca /mnt/temp -o name=myuser,secretfile=/etc/ceph/ceph-rook.secret,conf=/etc/ceph/ceph-rook.conf

What I really like about working with Rook instead of baremetal Ceph is that I can create additional users with Kubernetes manifests so I can version control them, instead of having to document long sequences of commands in a runbook:

apiVersion: ceph.rook.io/v1
kind: CephClient
metadata:
  name: myuser
spec:
  caps:
    mds: 'allow rw path=/volumes/static-pvcs/media/9a1f1581-6749-4146-a2aa-251fe2b58eca'
    mon: 'allow r'
    osd: 'allow rw tag cephfs data=homelab-fs'

This will allow the user to access only that specific static volume in the cluster.

Copying the media collection

My media collection has a size of about 1.7 TiB. I knew that copying it over would take quite a while, so I planned to do it from my Command&Control host. But then I got a weird feeling and decided to check the networking diagram. It looks something like this:

This is a network diagram. It shows several hosts: The first two are 'Baremetal Ceph Host' and 'Rook Ceph Host'. They're both in the same VLAN. Then there is 'Copy Host', which is connected to a different VLAN. All of them are connected to the same switch. Also connected to that switch is the 'Router routing VLANs'. The diagram shows a network flow starting at 'Baremetal Ceph Host' and going into the router via the switch. Then from the router it goes over back to the switch to end up in the 'Copy Host'. From there, the flow goes back out to the switch and to the router again, to then go back to the switch and end up in the Rook Ceph Host.

Network diagram with the packet flow for the copy operation.

The issue here is the fact my C&C host, called the Copy Host here, is in a different VLAN than the baremetal and Rook Ceph hosts. This means that some routing needs to happen for packets to get to and from the Ceph hosts to the copy host. This in turn means that all packets need to pass through the router. This would be fine if the packets only needed to pass through the router once. But in truth, they need to pass through the router twice. And they pass through the same NIC on the router even four times.

The packets go from the source, the baremetal Ceph cluster, up to the router via the link from the switch. Pass Nr. 1. Then they go down that same link again to reach the C&C host on its VLAN. Pass Nr. 2. The C&C host then sends it to the router again, now with the Rook Ceph host as the destination, pass Nr. 3. And finally, the router then sends the packets back again down that link between router and switch to finally arrive at the Ceph Rook host.

So because each packet passes the link twice in each direction, my maximum copy speed is suddenly reduced to 500 Mbit/s, which is a mere 62 MByte/s, slower even than the HDDs involved in this copy process.

I was contemplating which Homelab host to take out and install the necessary tools on when @badnetmask, rightly, asked why I don’t just launch a Pod somewhere. And that was what I finally went with.

I then remembered that there is a Rook Ceph Toolbox with all the necessary tools already installed and I decided to try that. After copying the credentials similar to what I explained above for my desktop mounts, I got an error:

bash-5.1$ mount -t ceph :/volumes/static-pvcs/media/9a1f1581-6749-4146-a2aa-251fe2b58eca /mnt/rook -o name=admin
mount: drop permissions failed.

I then changed the Pod’s Yaml a bit by running it as root. Which gave me an error again, but at least a different one:

[root@rook-ceph-tools-584df95dcb-vdwqc /]# mount -t ceph :/volumes/static-pvcs/media/9a1f1581-6749-4146-a2aa-251fe2b58eca /mnt/rook -o name=admin
Unable to apply new capability set.
modprobe: FATAL: Module ceph not found in directory /lib/modules/5.15.0-131-generic
failed to load ceph kernel module (1)
Unable to apply new capability set.
unable to determine mon addresses

To get rid of the failed attempt to load the Ceph kernel module, I then also added the /lib/modules directory as a volume to the Pod. This worked and got rid of the fatal modprobe error, but still left me with the other errors. So throwing up my hands, I set securityContext.privileged. I’m still a bit surprised that Linux doesn’t have a specific capability to add to be allowed to do mounting? Perhaps the ability to run mount is just so powerful that you’ve got CAP_SYS_ADMIN anyway?

The final Deployment I used:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rook-ceph-tools
  namespace: rook-cluster # namespace:cluster
spec:
  replicas: 1
  selector:
    matchLabels:
      app: rook-ceph-tools
  template:
    metadata:
      labels:
        app: rook-ceph-tools
    spec:
      dnsPolicy: ClusterFirstWithHostNet
      serviceAccountName: rook-ceph-default
      containers:
        - name: rook-ceph-tools
          image: quay.io/ceph/ceph:v18
          command:
            - /bin/bash
            - -c
            - |
              CEPH_CONFIG="/etc/ceph/ceph.conf"
              MON_CONFIG="/etc/rook/mon-endpoints"
              KEYRING_FILE="/etc/ceph/keyring"

              write_endpoints() {
                endpoints=$(cat ${MON_CONFIG})

                mon_endpoints=$(echo "${endpoints}"| sed 's/[a-z0-9_-]\+=//g')

                DATE=$(date)
                echo "$DATE writing mon endpoints to ${CEPH_CONFIG}: ${endpoints}"
                  cat <<EOF > ${CEPH_CONFIG}
              [global]
              mon_host = ${mon_endpoints}

              [client.admin]
              keyring = ${KEYRING_FILE}
              EOF
              }

              watch_endpoints() {
                real_path=$(realpath ${MON_CONFIG})
                initial_time=$(stat -c %Z "${real_path}")
                while true; do
                  real_path=$(realpath ${MON_CONFIG})
                  latest_time=$(stat -c %Z "${real_path}")

                  if [[ "${latest_time}" != "${initial_time}" ]]; then
                    write_endpoints
                    initial_time=${latest_time}
                  fi

                  sleep 10
                done
              }

              ceph_secret=${ROOK_CEPH_SECRET}
              if [[ "$ceph_secret" == "" ]]; then
                ceph_secret=$(cat /var/lib/rook-ceph-mon/secret.keyring)
              fi

              cat <<EOF > ${KEYRING_FILE}
              [${ROOK_CEPH_USERNAME}]
              key = ${ceph_secret}
              EOF

              write_endpoints

              watch_endpoints              
          imagePullPolicy: IfNotPresent
          tty: true
          securityContext:
            runAsNonRoot: false
            privileged: true
          env:
            - name: ROOK_CEPH_USERNAME
              valueFrom:
                secretKeyRef:
                  name: rook-ceph-mon
                  key: ceph-username
          volumeMounts:
            - mountPath: /etc/ceph
              name: ceph-config
            - name: mon-endpoint-volume
              mountPath: /etc/rook
            - name: ceph-admin-secret
              mountPath: /var/lib/rook-ceph-mon
              readOnly: true
            - name: modules
              mountPath: /lib/modules
              readOnly: true
      volumes:
        - name: ceph-admin-secret
          secret:
            secretName: rook-ceph-mon
            optional: false
            items:
              - key: ceph-secret
                path: secret.keyring
        - name: mon-endpoint-volume
          configMap:
            name: rook-ceph-mon-endpoints
            items:
              - key: data
                path: mon-endpoints
        - name: ceph-config
          emptyDir: {}
        - name: modules
          hostPath:
            path: /lib/modules # directory location on host
      tolerations:
        - key: "node.kubernetes.io/unreachable"
          operator: "Exists"
          effect: "NoExecute"
          tolerationSeconds: 5

Anyway, with the privileged option set, I was finally able to mount. Wanting to use rsync, I installed it with yum install rsync and mounted the baremetal and Rook CephFS subvolumes.

I used this command to execute the copy operation:

rsync -av --info=progress2 --info=name0 /mnt/baremetal/* /mnt/rook/

Here is the final output:

sent 1,748,055,479,314 bytes  received 155,334 bytes  54,890,039.24 bytes/sec
total size is 1,853,006,549,228  speedup is 1.06

The operation took a total of 9.5 h.

Deploying Jellyfin

Just for completeness’ sake, here is the Jellyfin Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: jellyfin
spec:
  replicas: 1
  selector:
    matchLabels:
      homelab/app: jellyfin
  strategy:
    type: "Recreate"
  template:
    metadata:
      labels:
        homelab/app: jellyfin
    spec:
      securityContext:
        fsGroup: 1006
        runAsUser: 1007
        runAsGroup: 1006
      containers:
        - name: jellyfin
          image: jellyfin/jellyfin:{{ .Values.appVersion }}
          command:
            - "/jellyfin/jellyfin"
            - "--datadir"
            - "{{ .Values.mounts.cacheAndConf }}/data"
            - "--cachedir"
            - "{{ .Values.mounts.cacheAndConf }}/cache"
            - "--ffmpeg"
            - "/usr/lib/jellyfin-ffmpeg/ffmpeg"
          volumeMounts:
            - name: cache-and-conf
              mountPath: {{ .Values.mounts.cacheAndConf }}
            - name: media
              mountPath: {{ .Values.mounts.media }}
          resources:
            requests:
              cpu: 1000m
              memory: 1000Mi
          livenessProbe:
            httpGet:
              port: {{ .Values.port }}
              path: "/health"
            initialDelaySeconds: 15
            periodSeconds: 30
          ports:
            - name: jellyfin-http
              containerPort: {{ .Values.port }}
              protocol: TCP
      volumes:
        - name: cache-and-conf
          persistentVolumeClaim:
            claimName: jellyfin-config-volume
        - name: media
          persistentVolumeClaim:
            claimName: jellyfin-media

Some specialties out of the ordinary here are the settings in the spec.securityContext. These are there to ensure that I’m getting the right permissions on the files produced on the media collection subvolume. All files on there have the GID 1006, which is historically my group on the first desktop connected to my first Homeserver, and it’s still serving as the shared group for my media collection. This is because both Jellyfin and my desktop user need to access the media files. With this configuration, new files are written with the correct GID by Jellyfin.

Another somewhat interesting point about Jellyfin: It does allow changing around the config and cache directories, as you can see in the containers[0].command, but it does not allow the same for the location of the media libraries. Those locations are hardcoded. I had pretty big problems with this fact back when I migrated from Docker Compose to Nomad, but sadly that was before I took extensive notes or documented everything in my internal wiki, so I can’t repeat the manual steps I used to migrate the data location back then. 😔

And that’s it already for this one. As I noted above, I will pretty closely follow this post with another one looking at Ceph during the large copy operation.

My next migration this coming weekend will be my Nextcloud instance. I’ll need to look at some Helm charts, but at this point I’m pretty sure I will just write my own one.