Wherein I let my kubectl certs expire and implement some monitoring.
A couple of days ago, I was getting through my list of small maintenance tasks in my Kubernetes cluster. Stuff like checking the resource consumption of new deployments and adapting the resource limits. And in the middle of it, one of my kubectl invocations was greeted by this message:
error: You must be logged in to the server (Unauthorized)
So I had a look at my kubectl credentials. For those who don’t know, kubectl
authenticates to the cluster with a client TLS cert by default. I had just
copied the admin.conf config file kubeadm helpfully creates during cluster
setup. I didn’t really see any reason to set up anything more elaborate,
considering that I’m the only admin in the cluster.
And those certs had now expired. Not really a big deal, I have access to the
control plane nodes and could copy the new admin.conf. But I wanted to
introduce some monitoring and document how to renew the kubectl client certs.
The first problem to tackle: I wanted something a bit more elaborate than
“just cat /etc/kubernetes/admin.conf and copy+paste the cert and key”. And
here’s where the embarrassment began. The admin.conf is available on my three
control plane nodes. But how to get it onto my command and control machine?
My first thought was: Just use SSH! But the problem was: I don’t allow root
logins via SSH. And the admin.conf is owned by root and not readable by anyone
else. So if I wanted to do it over SSH, I would need to also somehow get a sudo
call in there. Easier said than done. Because the only account which has SSH
access to my machines can’t just do sudo - it needs to provide a password, as
an additional security layer. And it took me a really, really long time to
figure out how to call sudo via SSH and get the password through the pipe to sudo.
Here’s the script I came up with:
#!/bin/bash
# Kubeadm installs put an admin user kube.conf file at /etc/kubernetes/admin.conf
# by default
ADMIN_FILE="/etc/kubernetes/admin.conf"
ADMIN_TEMP="${HOME}/temp/admin.conf"
# Name of the control plane host
CP_HOST="control-plane-1"
# Request the sudo password and put it into SUDO_PASS
# -s prevents echoing of the input on the terminal
read -p "Sudo pass: " -r -s SUDO_PASS
echo
ssh myuser@"${CP_HOST}" "sudo -p \"\" -S cat ${ADMIN_FILE}" <<<"${SUDO_PASS}" > ~/temp/admin.conf
# This extracts the certificate and the private key from the kube config
CERT_DATA=$(yq -r '.users[0].user."client-certificate-data"' "${ADMIN_TEMP}" | base64 -d | sed -e 's/$/\\n/g' | tr -d '\n')
CERT_KEY=$(yq -r '.users[0].user."client-key-data"' "${ADMIN_TEMP}" | base64 -d | sed -e 's/$/\\n/g' | tr -d '\n')
# Removing the temporary file for security
rm "${ADMIN_TEMP}"
# Finally outputting the cert
echo "CERT:"
echo "${CERT_DATA}"
echo "Key:"
echo "${CERT_KEY}"
The main piece here is the actual copying, which took me way too long to figure out:
ssh myuser@"${CP_HOST}" "sudo -p \"\" -S cat ${ADMIN_FILE}" <<<"${SUDO_PASS}" > ~/temp/admin.conf
It SSH’s to one of my CP hosts and runs sudo -p "" -S cat /etc/kubernetes/admin.conf.
The previously requested password read via read is piped into the SSH command’s
stdin as a HERESTRING. The -p "" is actually load bearing here. Without it,
sudo will show a prompt for the password, which will end up being redirected
into the temporary file in addition to the admin.conf file’s content.
The -S option tells sudo to expect receipt of the password on the command
line.
Another nifty little thing I discovered is yq, basically an equivalent of jq but for Yaml files.
I updated my credentials and everything worked again. But the fact that I allowed the certs to expire bugged me, and I decided to introduce another little script to regularly check the time to expiry of the kubectl client certs.
Monitoring the certs
The main problem with monitoring the cert was that it’s a client cert, so there’s
no HTTP endpoint I could hit to check it regularly. It is only present on my
command and control machine. So I needed something that runs on the C&C host,
and that I wouldn’t forget to check regularly. I ended up writing a small script
which checks the expiration dates and tuck it into my ~/.profile so it runs
whenever I log into the machine.
The script looks like this:
#!/bin/bash
# 30 days
WARNING_DURATION="2592000"
COLOR_RED='\e[0;31m'
NO_COLOR='\033[0m'
PROD_CERT=$(pass show k8s/credentials | jq -r .status.clientCertificateData)
CONFIG_CERT=$(pass show k8s/master-credentials | jq -r .status.clientCertificateData)
function checkExpiry() {
cluster="${1}"
cert="${2}"
if ! openssl x509 -checkend "${WARNING_DURATION}" -noout > /dev/null <<<"${cert}"; then
local endDate
endDate=$(openssl x509 -enddate -noout <<<"${cert}" | cut -d '=' -f2)
printf "${COLOR_RED}The ${cluster} cluster kubectl cert is about to expire!\nEnd date: %b${NO_COLOR}\n" "${endDate}"
fi
}
checkExpiry "production" "${PROD_CERT}"
checkExpiry "configuration" "${CONFIG_CERT}"
I’m starting out by fetching the credentials from my pass store. If you want to read more about my kube credential setup and how I changed it so that the kubectl credentials don’t just sit unencrypted on the disk, have a look at this post.
I’m using the openssl command line tool to do the checking, which already has
the checkend flag to check whether the given certificate is valid for at least
${WARNING_DURATION} more seconds. Quite a useful function, removing the need to
do date arithmetic in bash. If the cert is not valid for at least another 30
days, the script will output a warning in red. 30 days should be enough time for
me to log into the C&C host at least once, even during times like the current
one where I’m not working on Homelab projects much.
I’m calling the checkExpiry function twice, because I’ve got two clusters and
hence two sets of credentials. One is my main cluster running most of my workloads.
The other is intended as a management cluster. It’s currently still running in a
VM I only launch when needed, as part of my Tinkerbell experiments. I really need
to get back to those at some point…
My plan was to just stick the script into my ~/.profile file, so the check is
only done once, when I log into the machine. The ~/.profile script is only
sourced for a login shell, so it should not be executed when I’m just opening a
fresh terminal. But this didn’t work out as intended. I’m using tmux,
and for some reason, the script was executed whenever I open a new pane or window.
After some searching, I found that tmux runs a login shell for every new pane/window
by default.
I found the solution for changing that behavior in the Arch Linux wiki.
Following that instruction, I put the following line at the end of my ~/.tmux.conf
file:
set -g default-command "${SHELL}"
With that, I’d get the following output when the kubectl client cert gets close to the expiration date:
The production cluster kubectl cert is about to expire!
End date: Sep 14 11:31:30 2026 GMT
The configuration cluster kubectl cert is about to expire!
End date: May 31 20:29:11 2026 GMT
Monitoring kubeadm certs
While looking for instructions on how to renew my kubectl certs, I came upon this Kubernetes docs page. It mentions this command for getting the expiration dates of Kubeadm’s own certs:
kubeadm certs check-expiration
This command shows all of the certificates kubeadm generates for a cluster, including the certs for all of the Kubernetes control plane components:
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Sep 14, 2026 11:31 UTC 281d ca no
apiserver Sep 14, 2026 10:24 UTC 281d ca no
apiserver-etcd-client Sep 14, 2026 10:24 UTC 281d etcd-ca no
apiserver-kubelet-client Sep 14, 2026 10:24 UTC 281d ca no
controller-manager.conf Sep 14, 2026 10:24 UTC 281d ca no
etcd-healthcheck-client Sep 14, 2026 10:24 UTC 281d etcd-ca no
etcd-peer Sep 14, 2026 10:24 UTC 281d etcd-ca no
etcd-server Sep 14, 2026 10:24 UTC 281d etcd-ca no
front-proxy-client Sep 14, 2026 10:24 UTC 281d front-proxy-ca no
scheduler.conf Sep 14, 2026 10:24 UTC 281d ca no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Dec 17, 2033 19:15 UTC 8y no
etcd-ca Dec 17, 2033 19:15 UTC 8y no
front-proxy-ca Dec 17, 2033 19:15 UTC 8y no
Thinking back a little bit, I recalled that September 14th was the last time I ran a cluster update, so those already do a certificate renewal. In theory, that means I should be fine - I’m doing cluster updates frequently enough that I should never let those certs expire within their 365 day TTL. But I still wanted to monitor those somehow, just in case.
As some of those are client certs, I couldn’t just point my Gatus instance at them, like I do for my Let’s Encrypt main cert. While looking around, I came across this Prometheus exporter. It can launch a DaemonSet on k8s nodes and then watch certificate files (and kube config files as well) on disk and check their expiration dates. In short, it looked exactly like what I wanted. But there was a problem, as stated in their docs:
Be aware that for every file path provided to watchFiles, the exporter container will be given read access to the parent directory. This is how we handle the problem of changing inodes. Metrics will of course be limited to the single targetted path, as the program is told to watch the real path from watchFiles.
The full note explains that making the containing directory available is necessary
because when the certs are rotated, the exporter would keep the old file open, as
it wouldn’t have a way to know that the file was rotated. This makes sense. But
I find it problematic. The /etc/kubernetes/pki directory on my control plane
nodes looks like this:
-rw-r--r-- 1 root root 1123 Sep 14 12:26 apiserver-etcd-client.crt
-rw------- 1 root root 1675 Sep 14 12:26 apiserver-etcd-client.key
-rw-r--r-- 1 root root 1176 Sep 14 12:26 apiserver-kubelet-client.crt
-rw------- 1 root root 1675 Sep 14 12:26 apiserver-kubelet-client.key
-rw-r--r-- 1 root root 1314 Sep 14 12:26 apiserver.crt
-rw------- 1 root root 1675 Sep 14 12:26 apiserver.key
-rw-r--r-- 1 root root 1107 May 1 2025 ca.crt
-rw------- 1 root root 1675 May 1 2025 ca.key
drwxr-xr-x 2 root root 4096 May 1 2025 etcd
-rw-r--r-- 1 root root 1123 May 1 2025 front-proxy-ca.crt
-rw------- 1 root root 1679 May 1 2025 front-proxy-ca.key
-rw-r--r-- 1 root root 1119 Sep 14 12:26 front-proxy-client.crt
-rw------- 1 root root 1675 Sep 14 12:26 front-proxy-client.key
-rw------- 1 root root 1679 May 1 2025 sa.key
-rw-r--r-- 1 root root 451 May 1 2025 sa.pub
So if I were to tell the exporter to watch all of the .crt files, it would also
necessarily gain read access to the .key files. Which means that I would now
have a program running in my cluster which could read the certificates and private
keys of the main Kubernetes infrastructure in my Homelab. That just does not
sound like a good idea to me.
I wasn’t able to come up with a proper solution, so I decided to just monitor the apiserver certificate and use it as a stand-in for the other cert’s expiration dates. They should all be renewed together during my regular cluster updates, so just monitoring one of the certs should be good enough. 🤞
I did not even have to make any changes in Gatus, as it already reports the expiry dates of all certificates for HTTPS endpoints it monitors. Creating a Grafana panel was as easy as using this PromQL query:
gatus_results_certificate_expiration_seconds{name="K8s: API"}
It refers to this entry in my Gatus config file:
- name: "K8s: API"
group: "K8s"
url: "https://k8s.example.com:6443/livez"
method: "GET"
interval: 5m
conditions:
- "[STATUS] == 200"
client:
insecure: true
One last thing slightly bothering me are the CA certs. Those expire in 8 years, and I decided to not bother monitoring them. I will leave them un-monitored to add a bit of potential excitement to future me’s life. 😁