In the last post of the series on my Kubernetes experiments, I described how to initialize the cluster. In this post, I will go into a bit more detail on what I did once I finally had a cluster set up.

Tutorials

Never having done anything with Kubernetes before, I started out with a couple of tutorials.

The first one was this one. It uses Redis as an example deployment to demonstrate how to use ConfigMaps. This is an interesting topic for me, because one of the things I liked a lot about Nomad was the tight integration with consul-template for config files and environment variables via the template stanza. This stanza allows the user to template config files with inputs taken from other tools. My main use case at the moment is taking secrets from Vault and injecting them into configuration files. Kubernetes does not have this capability out of the box, but I will get into how I will do it further down in this post. The one important piece of knowledge I gained from this tutorial was that when a ConfigMap is used by the pod spec in a deployment manifest, the deployment’s pods are not automatically restarted to take the new configuration into account. This is a bit annoying, to be honest, because it’s something which Nomad does out of the box, at least for certain ways of writing job files. The solution I found for this (while working with pure kubectl at least, using Helm the problem can be solved more elegantly) was to just run kubectl rollout restart deployment <NAME>.

Next up was a small tutorial setting up a Service for the first time with Nginx. At first I had a little problem with this one, because I had written the ConfigMap for it like this:

apiVersion: v1
kind: ConfigMap
metadata:
  name: nginxconfigmap
  labels:
    homelab/name: nginx
    homelab/component: webserver
    homelab/part-of: connecting-apps
    usage: tutorials
data:
  default: |
    server {
            listen 80 default_server;
            listen [::]:80 default_server ipv6only=on;

            listen 443 ssl;

            root /usr/share/nginx/html;
            index index.html;

            server_name localhost;
            ssl_certificate /etc/nginx/ssl/tls.crt;
            ssl_certificate_key /etc/nginx/ssl/tls.key;

            location / {
                    try_files $uri $uri/ =404;
            }
    }    

As a consequence, nothing came up in the Nginx container, but I also wasn’t getting any error messages in the logs. So I first assumed that something was wrong with the Service setup, because I was getting “Connection refused” errors. But it turns out I just didn’t understand the ConfigMap semantics correctly. The keys under the data: key are actual filenames. So in the setup above, I was adding a file just called default and mounted it into the Nginx conf directory. But in the main Nginx config, only files with the .conf extension are automatically included from the config snippet dir. But because there wasn’t anything malformed about the config, I was simply getting an Nginx instance without a server block, instead of some sort of error message. Just changing that default: key to default.conf: fixed the issue. This was also the first service I made available outside the cluster, using a NodePort type service. It looks like this:

apiVersion: v1
kind: Service
metadata:
  name: connecting-apps-nginx
  labels:
    homelab/name: nginx
    homelab/component: webserver
    homelab/part-of: connecting-apps
    usage: tutorials
spec:
  type: NodePort
  ports:
  - port: 8080
    targetPort: 80
    protocol: TCP
    name: http
  - port: 443
    protocol: TCP
    name: https
  selector:
    homelab/name: nginx
    homelab/component: webserver
    homelab/part-of: connecting-apps

This service listens on two random ports on every single Kubernetes node. If packets arrive on those random ports, they are then forwarded to the Pod running Nginx. At first, I thought this would be the way I would be running my Traefik ingress later on, but then I realized that while you can configure an explicit port for NodePort services, they can only have port numbers > 30000.

Next, I had a look an example from Cilium, to get more comfortable with Network Policies. In this example, you launch a number of Star Wars themed services, deciding docking permissions for the Death Star. This is simulated with CiliumNetworkPolicy objects and was pretty good at teaching me the basics there. I’m especially interested in NetworkPolicy as a network connection permission mechanism. In my Nomad cluster, I’m using Consul Connect to control the connections between different services, deciding who can connect to who, and I wanted something similar in Kubernetes. NetworkPolicies do exactly that, and this nice Star Wars demo from Cilium demonstrated that.

The last tutorial was more of an “all-in-one” deal with a lot more complexity, connecting several services to each other. I made it even more interesting by instituting a deny all network policy on the “default” namespace. As a consequence, I needed to make sure that both, the Redis pods could talk to each other for replication, and that the PHP frontend could talk to the Redis pods. After having just finished the Cilium demo, that part was pretty simple. What I overlooked completely: I also had to explicitly allow traffic from outside the cluster to reach the frontend, which I could do with a policy like this:

apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "guestbook-redis-allow"
  labels:
    homelab/name: guestbook-redis-allow
    homelab/part-of: guestbook
    usage: tutorial
spec:
  description: "L3-L4 policy to restrict redis access to frontend only"
  endpointSelector:
    matchLabels:
      homelab/name: redis
      homelab/component: key-value-store
      homelab/part-of: guestbook
  ingress:
  - fromEndpoints:
    - matchLabels:
        homelab/name: guestbook
        homelab/component: frontend
        homelab/part-of: guestbook
    toPorts:
      - ports:
        - port: "6379"
          protocol: TCP

Without this, Cilium dutifully blocks all external traffic coming in via the NodePort service I had configured for the tutorial.

How to handle all those manifests?

While doing the tutorials, I started wondering how to handle all of those Yaml files - in some better way than running kubectl apply for every one of them. At first I looked at Helm, which already looked more like what I wanted. But then somebody on Mastodon mentioned Helmfile. This also uses Helm in the background, but has a central config file to combine all the things I’ve currently got deployed in my cluster, and it allows deployment of all of it with a single command. Exactly what I was looking for.

Currently, with Traefik Ingress I set up myself, and using the Ceph Rook Helm chart, my Helmfile looks like this:

repositories:
  - name: rook-release
    url: https://charts.rook.io/release
releases:
  - name: ceph-rook-operator
    chart: rook-release/rook-ceph
    version: v1.12.5
    namespace: rook-ceph
    values:
      - ./value-files/rook-operator.yaml
  - name: ceph-rook-cluster-internal
    chart: rook-release/rook-ceph-cluster
    version: v1.12.5
    namespace: rook-ceph-cluster-internal
    values:
      - ./value-files/rook-ceph-cluster-internal.yaml
  - name: traefik
    chart: ./traefik
    namespace: traefik-ingress
    values:
      - appVersion: "v2.10.4"
      - meiHomeNetCert:
          chain: |
            {{- "ref+vault://secret/cert#/foo" | fetchSecretValue | nindent 12 }}            
          key: |
            {{- "ref+vault://secret/cert#/bar" | fetchSecretValue | nindent 12 }}            
      - basicAuthAdminPw: {{ "ref+vault://secret/traefik/auth/baz#/pw" | fetchSecretValue }}

This format has several nice features. Each entry in the releases array is a different Helm chart. As you can see, for the Ceph Rook charts I use the official sources, while the Traefik chart comes from a local directory as I wrote it myself. I will write separate blog posts about both, Traefik as Ingress and Ceph Rook. Besides defining which charts to apply, you can also centrally define the namespaces and values. Here I’m using two different ways of defining values for the Helm charts. For the Ceph Rook deployments, I’m using separate value files, because they need a lot of config. But Traefik’s values I just define directly inside the Helmfile.

Another big plus is Helmfile’s ability to get secrets from Vault, which I use here to get at my Let’s Encrypt certs.

Levels of templating: Two. Helmfile’s own, and then Helm’s to generate the actual Manifests.

While working on this and complaining a bit on Mastodon about the fact that Pods are not restarted when the config file changes, I was pointed towards a neat trick which can be applied when deploying with Helm. This trick uses the fact that when an annotation of a Pod changes, the Pod is automatically redeployed.

Let’s say we have a Deployment with a spec.template like this:

  template:
    metadata:
      labels:
        homelab/name: "traefik"
      annotations:
        checksum/static-conf: {{ include (print $.Template.BasePath "/static-conf.yml") . | sha256sum }}
    spec:
      containers:
        - name: traefik
          image: traefik:{{ .Values.appVersion }}
          volumeMounts:
            - name: static-conf
              mountPath: "/etc/traefik"
              readOnly: true
      volumes:
        - name: static-conf
          configMap:
            name: traefik-static-conf

And then we have a ConfigMap like this at templates/static-conf.yml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: traefik-static-conf
  labels:
    homelab/name: "traefik"
    {{- range $label, $value := .Values.commonLabels }}
    {{ $label }}: {{ $value | quote }}
    {{- end }}
data:
  traefik.yml: |
{{ tpl (.Files.Get "configs/static.yml") . | indent 4 }}

Then, whenever this ConfigMap, or the configs/static.yml referenced in the tpl function, change content, the annotation on the Deployment’s template also changes, and the Pod is redeployed. This way, the Pod is automatically restarted when the config file is changed.

You can also see another nice point about using Helmfile, at least with local Helm charts you create yourself: I can set central, common labels to be set on all resources.

Secrets

One little story about Secrets I need to tell here: I got myself utterly confused about how Secrets work. For some reason, I got it into my head for several days that with Secrets being stored just in plain text (okay, base64 encoded), they were a security risk. Going from that, I also felt that things like external-secrets wouldn’t add anything - yes, it takes the secrets from e.g. Vault, but then they are again stored unencrypted in the cluster.

But of course that’s a misconception. Secrets cannot just randomly be accessed by anything running in a Kubernetes cluster, which was my initial impression. They require access via the Kubernetes API server, which can be controlled via RBAC. So for now at least, I decided to rely on Helmfile’s Vals to extract the secrets from Vault at deployment time. This just looks simpler than setting up e.g. external-secrets. I also see an, albeit small, security advantage here, because I don’t need to configure anything with broad access to my Vault instance in the cluster. Instead, I can rely on Vault’s login mechanisms on my Command and Control host, which uses time limited tokens and such.

Showing cluster resource usage

I was also all the time looking for some place where I could see how much capacity I still had free in my experimental cluster. This is something which Nomad has baked into its web UI, but Kubernetes does not have anything like it out of the box.

I was finally pointed towards kube-capacity, a kubectl plugin. It does exactly what I wanted, telling me how much free capacity I still have left on the cluster and individual node level. The output looks something like this at the moment:

kubectl resource-capacity
NODE     CPU REQUESTS   CPU LIMITS     MEMORY REQUESTS   MEMORY LIMITS
*        17600m (44%)   31100m (77%)   27551Mi (58%)     46408Mi (98%)
mehen    2150m (53%)    3000m (75%)    1524Mi (43%)      3472Mi (98%)
mesta    1950m (48%)    3000m (75%)    1384Mi (39%)      3132Mi (88%)
min      1950m (48%)    3000m (75%)    1384Mi (39%)      3132Mi (88%)
nakith   4450m (74%)    9300m (155%)   9660Mi (87%)      15008Mi (136%)
naunet   5350m (89%)    9900m (165%)   10472Mi (95%)     16032Mi (145%)
sait     950m (11%)     1200m (15%)    1619Mi (22%)      2560Mi (35%)
sehith   800m (10%)     1700m (21%)    1508Mi (20%)      3072Mi (42%)

This shows me that all of my current pod’s requests together use 44% of the available CPU capacity and 58% of the memory capacity. It has already been pretty useful for figuring out why I wasn’t able to deploy all of my Ceph Rook pods, for example.

Thanks to the Homelabbers in the Fediverse

Kubernetes is a pretty complex topic, as I have now found out. There are a lot of pitfalls and great tools to avoid them which I might never have found just from Googling. The Fediverse homelabbing community has been extremely helpful in pointing me in the right direction multiple times, e.g. in recommending Helmfile and kube-capacity to me.

Thanks everyone! 🙂