Kubernetes Cert Renewal and Monitoring

Wherein I let my kubectl certs expire and implement some monitoring.

A couple of days ago, I was getting through my list of small maintenance tasks in my Kubernetes cluster. Stuff like checking the resource consumption of new deployments and adapting the resource limits. And in the middle of it, one of my kubectl invocations was greeted by this message:

error: You must be logged in to the server (Unauthorized)

So I had a look at my kubectl credentials. For those who don’t know, kubectl authenticates to the cluster with a client TLS cert by default. I had just copied the admin.conf config file kubeadm helpfully creates during cluster setup. I didn’t really see any reason to set up anything more elaborate, considering that I’m the only admin in the cluster.

And those certs had now expired. Not really a big deal, I have access to the control plane nodes and could copy the new admin.conf. But I wanted to introduce some monitoring and document how to renew the kubectl client certs.

The first problem to tackle: I wanted something a bit more elaborate than “just cat /etc/kubernetes/admin.conf and copy+paste the cert and key”. And here’s where the embarrassment began. The admin.conf is available on my three control plane nodes. But how to get it onto my command and control machine?

My first thought was: Just use SSH! But the problem was: I don’t allow root logins via SSH. And the admin.conf is owned by root and not readable by anyone else. So if I wanted to do it over SSH, I would need to also somehow get a sudo call in there. Easier said than done. Because the only account which has SSH access to my machines can’t just do sudo - it needs to provide a password, as an additional security layer. And it took me a really, really long time to figure out how to call sudo via SSH and get the password through the pipe to sudo.

Here’s the script I came up with:

#!/bin/bash

# Kubeadm installs put an admin user kube.conf file at /etc/kubernetes/admin.conf
# by default
ADMIN_FILE="/etc/kubernetes/admin.conf"
ADMIN_TEMP="${HOME}/temp/admin.conf"
# Name of the control plane host
CP_HOST="control-plane-1"

# Request the sudo password and put it into SUDO_PASS
# -s prevents echoing of the input on the terminal
read -p "Sudo pass: " -r -s SUDO_PASS
echo

ssh myuser@"${CP_HOST}" "sudo -p \"\" -S cat ${ADMIN_FILE}" <<<"${SUDO_PASS}" > ~/temp/admin.conf

# This extracts the certificate and the private key from the kube config
CERT_DATA=$(yq -r '.users[0].user."client-certificate-data"' "${ADMIN_TEMP}" | base64 -d | sed -e 's/$/\\n/g' | tr -d '\n')
CERT_KEY=$(yq -r '.users[0].user."client-key-data"' "${ADMIN_TEMP}" | base64 -d | sed -e 's/$/\\n/g' | tr -d '\n')

# Removing the temporary file for security
rm "${ADMIN_TEMP}"

# Finally outputting the cert
echo "CERT:"
echo "${CERT_DATA}"
echo "Key:"
echo "${CERT_KEY}"

The main piece here is the actual copying, which took me way too long to figure out:

ssh myuser@"${CP_HOST}" "sudo -p \"\" -S cat ${ADMIN_FILE}" <<<"${SUDO_PASS}" > ~/temp/admin.conf

It SSH’s to one of my CP hosts and runs sudo -p "" -S cat /etc/kubernetes/admin.conf. The previously requested password read via read is piped into the SSH command’s stdin as a HERESTRING. The -p "" is actually load bearing here. Without it, sudo will show a prompt for the password, which will end up being redirected into the temporary file in addition to the admin.conf file’s content. The -S option tells sudo to expect receipt of the password on the command line.

Another nifty little thing I discovered is yq, basically an equivalent of jq but for Yaml files.

I updated my credentials and everything worked again. But the fact that I allowed the certs to expire bugged me, and I decided to introduce another little script to regularly check the time to expiry of the kubectl client certs.

Monitoring the certs

The main problem with monitoring the cert was that it’s a client cert, so there’s no HTTP endpoint I could hit to check it regularly. It is only present on my command and control machine. So I needed something that runs on the C&C host, and that I wouldn’t forget to check regularly. I ended up writing a small script which checks the expiration dates and tuck it into my ~/.profile so it runs whenever I log into the machine.

The script looks like this:

#!/bin/bash

# 30 days
WARNING_DURATION="2592000"
COLOR_RED='\e[0;31m'
NO_COLOR='\033[0m'

PROD_CERT=$(pass show k8s/credentials | jq -r .status.clientCertificateData)
CONFIG_CERT=$(pass show k8s/master-credentials | jq -r .status.clientCertificateData)

function checkExpiry() {
  cluster="${1}"
  cert="${2}"

  if ! openssl x509 -checkend "${WARNING_DURATION}" -noout > /dev/null <<<"${cert}"; then
    local endDate
    endDate=$(openssl x509 -enddate -noout <<<"${cert}" | cut -d '=' -f2)
    printf "${COLOR_RED}The ${cluster} cluster kubectl cert is about to expire!\nEnd date: %b${NO_COLOR}\n" "${endDate}"
  fi
}

checkExpiry "production" "${PROD_CERT}"
checkExpiry "configuration" "${CONFIG_CERT}"

I’m starting out by fetching the credentials from my pass store. If you want to read more about my kube credential setup and how I changed it so that the kubectl credentials don’t just sit unencrypted on the disk, have a look at this post.

I’m using the openssl command line tool to do the checking, which already has the checkend flag to check whether the given certificate is valid for at least ${WARNING_DURATION} more seconds. Quite a useful function, removing the need to do date arithmetic in bash. If the cert is not valid for at least another 30 days, the script will output a warning in red. 30 days should be enough time for me to log into the C&C host at least once, even during times like the current one where I’m not working on Homelab projects much.

I’m calling the checkExpiry function twice, because I’ve got two clusters and hence two sets of credentials. One is my main cluster running most of my workloads. The other is intended as a management cluster. It’s currently still running in a VM I only launch when needed, as part of my Tinkerbell experiments. I really need to get back to those at some point…

My plan was to just stick the script into my ~/.profile file, so the check is only done once, when I log into the machine. The ~/.profile script is only sourced for a login shell, so it should not be executed when I’m just opening a fresh terminal. But this didn’t work out as intended. I’m using tmux, and for some reason, the script was executed whenever I open a new pane or window.

After some searching, I found that tmux runs a login shell for every new pane/window by default. I found the solution for changing that behavior in the Arch Linux wiki. Following that instruction, I put the following line at the end of my ~/.tmux.conf file:

set -g default-command "${SHELL}"

With that, I’d get the following output when the kubectl client cert gets close to the expiration date:

The production cluster kubectl cert is about to expire!
End date: Sep 14 11:31:30 2026 GMT
The configuration cluster kubectl cert is about to expire!
End date: May 31 20:29:11 2026 GMT

Monitoring kubeadm certs

While looking for instructions on how to renew my kubectl certs, I came upon this Kubernetes docs page. It mentions this command for getting the expiration dates of Kubeadm’s own certs:

kubeadm certs check-expiration

This command shows all of the certificates kubeadm generates for a cluster, including the certs for all of the Kubernetes control plane components:

CERTIFICATE                  EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                   Sep 14, 2026 11:31 UTC   281d            ca                      no
apiserver                    Sep 14, 2026 10:24 UTC   281d            ca                      no
apiserver-etcd-client        Sep 14, 2026 10:24 UTC   281d            etcd-ca                 no
apiserver-kubelet-client     Sep 14, 2026 10:24 UTC   281d            ca                      no
controller-manager.conf      Sep 14, 2026 10:24 UTC   281d            ca                      no
etcd-healthcheck-client      Sep 14, 2026 10:24 UTC   281d            etcd-ca                 no
etcd-peer                    Sep 14, 2026 10:24 UTC   281d            etcd-ca                 no
etcd-server                  Sep 14, 2026 10:24 UTC   281d            etcd-ca                 no
front-proxy-client           Sep 14, 2026 10:24 UTC   281d            front-proxy-ca          no
scheduler.conf               Sep 14, 2026 10:24 UTC   281d            ca                      no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Dec 17, 2033 19:15 UTC   8y              no
etcd-ca                 Dec 17, 2033 19:15 UTC   8y              no
front-proxy-ca          Dec 17, 2033 19:15 UTC   8y              no

Thinking back a little bit, I recalled that September 14th was the last time I ran a cluster update, so those already do a certificate renewal. In theory, that means I should be fine - I’m doing cluster updates frequently enough that I should never let those certs expire within their 365 day TTL. But I still wanted to monitor those somehow, just in case.

As some of those are client certs, I couldn’t just point my Gatus instance at them, like I do for my Let’s Encrypt main cert. While looking around, I came across this Prometheus exporter. It can launch a DaemonSet on k8s nodes and then watch certificate files (and kube config files as well) on disk and check their expiration dates. In short, it looked exactly like what I wanted. But there was a problem, as stated in their docs:

Be aware that for every file path provided to watchFiles, the exporter container will be given read access to the parent directory. This is how we handle the problem of changing inodes. Metrics will of course be limited to the single targetted path, as the program is told to watch the real path from watchFiles.

The full note explains that making the containing directory available is necessary because when the certs are rotated, the exporter would keep the old file open, as it wouldn’t have a way to know that the file was rotated. This makes sense. But I find it problematic. The /etc/kubernetes/pki directory on my control plane nodes looks like this:

-rw-r--r-- 1 root root 1123 Sep 14 12:26 apiserver-etcd-client.crt
-rw------- 1 root root 1675 Sep 14 12:26 apiserver-etcd-client.key
-rw-r--r-- 1 root root 1176 Sep 14 12:26 apiserver-kubelet-client.crt
-rw------- 1 root root 1675 Sep 14 12:26 apiserver-kubelet-client.key
-rw-r--r-- 1 root root 1314 Sep 14 12:26 apiserver.crt
-rw------- 1 root root 1675 Sep 14 12:26 apiserver.key
-rw-r--r-- 1 root root 1107 May  1  2025 ca.crt
-rw------- 1 root root 1675 May  1  2025 ca.key
drwxr-xr-x 2 root root 4096 May  1  2025 etcd
-rw-r--r-- 1 root root 1123 May  1  2025 front-proxy-ca.crt
-rw------- 1 root root 1679 May  1  2025 front-proxy-ca.key
-rw-r--r-- 1 root root 1119 Sep 14 12:26 front-proxy-client.crt
-rw------- 1 root root 1675 Sep 14 12:26 front-proxy-client.key
-rw------- 1 root root 1679 May  1  2025 sa.key
-rw-r--r-- 1 root root  451 May  1  2025 sa.pub

So if I were to tell the exporter to watch all of the .crt files, it would also necessarily gain read access to the .key files. Which means that I would now have a program running in my cluster which could read the certificates and private keys of the main Kubernetes infrastructure in my Homelab. That just does not sound like a good idea to me.

I wasn’t able to come up with a proper solution, so I decided to just monitor the apiserver certificate and use it as a stand-in for the other cert’s expiration dates. They should all be renewed together during my regular cluster updates, so just monitoring one of the certs should be good enough. 🤞

I did not even have to make any changes in Gatus, as it already reports the expiry dates of all certificates for HTTPS endpoints it monitors. Creating a Grafana panel was as easy as using this PromQL query:

gatus_results_certificate_expiration_seconds{name="K8s: API"}

It refers to this entry in my Gatus config file:

- name: "K8s: API"
  group: "K8s"
  url: "https://k8s.example.com:6443/livez"
  method: "GET"
  interval: 5m
  conditions:
    - "[STATUS] == 200"
  client:
    insecure: true

One last thing slightly bothering me are the CA certs. Those expire in 8 years, and I decided to not bother monitoring them. I will leave them un-monitored to add a bit of potential excitement to future me’s life. 😁

Monitoring the certs#

Monitoring kubeadm certs#

Monitoring the certs

Monitoring kubeadm certs