After setting up my lab environment in the previous article, I’ve now also set up the Kubernetes cluster itself, with kubeadm as the setup tool and Cilium as the CNI plugin for networking.
Here, I will describe why I chose the tools I did, and how I initialized the cluster, as well as how to remove the cluster when necessary.
Tools choice
Before setting up a cluster, several choices need to be made. The first one in the case of Kubernetes is which distribution to use.
The first one, and the one I chose, is “vanilla” k8s. This is the default distribution, with full support for all the related standards and functionality.
Another well-liked one is k3s, which bills itself as a
lightweight distribution. Its most distinguishing feature seems to be the
fact that its control plane comes along as a single binary, instead of an entire
set, as in the case of vanilla k8s.
Also in contrast to k8s, it uses a simple SQLite database as a storage backend
for cluster data, instead of a full etcd
cluster.
It also falls into the “opinionated” part of the tech spectrum. Instead of
telling you to make a choice on things like CNI and CRI, it already comes
with some options out of the box. Flannel is pre-chosen as a CNI plugin,
while e.g. Traefik is already set up as an Ingress Controller.
If you want to go even further from vanilla, there’s also things like Talos Linux. It’s an entire Linux distro made with only one goal: Running a Kubernetes cluster. It doesn’t even allow you to SSH into it.
For now, I will stay with vanilla k8s, which I will install with kubeadm
. Simply
because I like making the “vanilla” experience my first contact with some tech.
I also prefer being forced to make my own decisions on tools, so that I am forced
to inform myself about the alternatives. Once I’ve completed my current experimentation,
I will likely at least take another look at Talos OS. Its premise sounds quite
interesting, especially with the declarative config files, but the “no SSH”
is honestly somewhat weird for me.
The next choice to be made is the CRI, the container runtime. The only thing I knew going into this is that I did not want to go with Docker. Too many bad experiences with memory leaks and other shenanigans with their daemon. After some research, my choice fell on CRI-O. To be honest, mostly because it bills itself as a container engine focused on use with Kubernetes.
Next is the CNI plugin. This is the piece of the Kubernetes stack which controls networking, most importantly inter-Pod networking. With this, I had my biggest problem to choose. The websites of all of them are chock-full of buzzwords. eBPF! It’s better than sliced bread! 😒 In the end, my decision was between Cilium and Calico. The one thing I was really interested in and I definitely wanted was Network Policies. Those allow defining rules for inter-Pod connectivity, allowing me to define which pods can talk with each other. I like having this for the sake of security, so that e.g. only the apps which actually need a DB can talk to the Postgres pod. In my current HashiCorp Nomad based cluster, I’ve got something similar using Consul’s service mesh. One more thing I find pretty nice is that both Calico and Cilium support encryption. This was another reason for why I started using Consul: It provides me with encrypted network traffic, without me having to setup TLS certs for each individual service. In the end, even after reading through most of the docs for both Calico and Cilium, I didn’t know which one to choose. So I did the obvious thing:
And that’s how I came to use Cilium as the CNI plugin in my cluster. 😅
Without further ado, let’s conjure ourselves a Kubernetes cluster. 🤓
Preparing the machines
Before we can actually call kubeadm init
, we need to install the tools on the
machines and do some additional config. For most of the setup, I followed the
official Kubernetes kubeadm docs.
Before installing the tools, a couple of config options need to be set on the machines, defined here.
I configured the options using Ansible:
- name: load overlay kernel module
tags:
- kubernetes
- kernel
community.general.modprobe:
name: overlay
persistent: present
state: present
- name: load br_netfilter kernel module
tags:
- kubernetes
- kernel
community.general.modprobe:
name: br_netfilter
persistent: present
state: present
- name: enable ipv4 netfilter on bridge interfaces
tags:
- kubernetes
- kernel
ansible.posix.sysctl:
name: net.bridge.bridge-nf-call-iptables
value: 1
state: present
- name: enable ipv6 netfilter on bridge interfaces
tags:
- kubernetes
- kernel
ansible.posix.sysctl:
name: net.bridge.bridge-nf-call-ip6tables
value: 1
state: present
- name: enable ipv4 forwarding
tags:
- kubernetes
- kernel
ansible.posix.sysctl:
name: net.ipv4.ip_forward
value: 1
state: present
This takes care of the pre-config. But if you’re using the Ubuntu cloud image, for example with LXD VMs, you also need to switch to a different kernel to later be able to make use of Cilium as the CNI plugin.
Fun with Ubuntu’s cloud images
When I started up the cluster and went to install Cilium, its containers went
into a crash loop, due to the Ubuntu cloud image kernel, which is specialized
for use with e.g. KVM, not having all Kernel modules available. For me,
the kernel installed was linux-image-kvm
. The Cilium docs have a page detailing
the required kernel config.
I initially thought: Those will be fulfilled by any decently current kernel.
But I was wrong, the -kvm
variant of Ubuntu’s kernel seems to be lacking
some of the configs.
To fix this, I then needed to switch to the -generic
kernel. Naively, I
again thought: How difficult could it possibly be? And I just ran
apt remove linux-image-5.15.0-1039-kvm
. That did not have the hoped for effect.
Instead, it tried to remove that image and then install linux-image-unsigned-5.15.0-1039-kvm
.
Which would not have been too useful. Finally, I followed this tutorial, but decided to install the -generic
kernel, instead of the -virtual
one.
Installing and setting up CRI-O
As noted above, cri-o is my container interface of choice.
To install it, several additional APT repos need to be added. I will only show
the Ansible setup for one of them here. First, we need the public keys for the
repos, which I normally just download and then store in my Homelab repo. Before
storing the keys, you should pipe them through gpg --dearmor
.
Setting up the keys then simply means copying them into the right directory, where apt can find them:
- name: add libcontainers cri-o repo key
tags:
- kubernetes
- crio
copy:
src: libcontainers-crio-keyring.gpg
dest: /usr/share/keyrings/libcontainers-crio-keyring.gpg
owner: root
group: root
mode: 0644
Once that’s done, we can set up the actual repo:
- name: add libcontainers cri-o ubuntu repo
tags:
- kubernetes
- crio
apt_repository:
repo: >
deb [signed-by=/usr/share/keyrings/libcontainers-crio-keyring.gpg]
https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/1.27/xUbuntu_22.04/ /
state: present
filename: libcontainers-crio
register: libcontainers_ubuntu_repo
when: ansible_facts['distribution'] == 'Ubuntu'
As you can see, I’m only adding the specific Ubuntu repo if the distribution actually is Ubuntu. Please note that I’ve only shown one additional repo, but there’s another one which needs to be added in a similar way.
Finally, we can install cri-o:
- name: create cri-o config dir
tags:
- kubernetes
- crio
ansible.builtin.file:
path: /etc/crio
state: directory
owner: root
group: root
mode: '755'
- name: install cri-o config file
tags:
- kubernetes
- crio
ansible.builtin.copy:
dest: /etc/crio/crio.conf
src: crio.conf
owner: root
group: root
mode: '644'
- name: install cri-o
tags:
- kubernetes
- crio
ansible.builtin.apt:
name:
- cri-o
- cri-o-runc
state: present
install_recommends: false
- name: autostart cri-o
tags:
- crio
- kubernetes
ansible.builtin.systemd_service:
name: crio
enabled: true
state: started
One weird thing which tripped me up in the beginning was that cri-o needs runc
,
but it doesn’t come with a dependency on cri-o-runc
.
The config file I’m using for cri-o is pretty simple, as the defaults were mostly fine for me.
[crio.runtime]
cgroup_manager = "systemd"
So at least for now, I just make sure that systemd
is set as the cgroup
manager, as the CRI’s and the kubelet’s cgroup manager need to be the same.
Installing the Kubernetes tools
Finally, we need to install a couple of Kubernetes tools. First, similar to above,
we need to add the Kubernetes APT repo at pkgs.k8s.io
. Then we can install
the three necessary Kubernetes tools:
- name: install kubernetes tools
tags:
- kubernetes
ansible.builtin.apt:
name:
- kubelet
- kubeadm
- kubectl
state: present
install_recommends: false
In addition to installing the tools, they should also be pinned to their respective versions, as updating the Kubernetes tools cannot be done during a random system package update:
- name: pin kubelet version
tags:
- kubernetes
dpkg_selections:
name: kubelet
selection: hold
- name: pin kubeadm version
tags:
- kubernetes
dpkg_selections:
name: kubeadm
selection: hold
- name: pin kubectl version
tags:
- kubernetes
dpkg_selections:
name: kubectl
selection: hold
- name: autostart kubelet
tags:
- kubelet
- kubernetes
ansible.builtin.systemd_service:
name: kubelet
enabled: true
At the end, I’m also making sure that the kubelet is auto-started.
Setting up kube-vip as a load balancer for the control plane
If you want a HA control plane with Kubernetes, you need a load balancer to balance the Kubernetes API endpoint to the three control plane instances.
Luckily, you don’t need to migrate your Homelab into a cloud to make this work. Through some helpful comments on the Fediverse, I was pointed towards the kube-vip app. It provides a virtual IP for the control plane, notably the Kubernetes API server. In my setup, I ran it as a static pod, as I liked the idea of tying it to the kubelet, instead of running it standalone.
To do so, I put the following static pod config file into /etc/kubernetes/manifests/kube-vip.yaml
:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
name: kube-vip
namespace: kube-system
spec:
containers:
- args:
- manager
env:
- name: vip_arp
value: "true"
- name: bgp_enable
value: "false"
- name: port
value: "6443"
- name: vip_cidr
value: "32"
- name: cp_enable
value: "true"
- name: svc_enable
value: "false"
- name: cp_namespace
value: kube-system
- name: vip_ddns
value: "false"
- name: vip_leaderelection
value: "true"
- name: vip_leasename
value: plndr-cp-lock
- name: vip_leaseduration
value: "5"
- name: vip_renewdeadline
value: "3"
- name: vip_retryperiod
value: "1"
- name: address
value: 10.12.0.100
- name: prometheus_server
value: :2112
image: ghcr.io/kube-vip/kube-vip:v0.6.2
imagePullPolicy: Always
name: kube-vip
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
- NET_RAW
volumeMounts:
- mountPath: /etc/kubernetes/admin.conf
name: kubeconfig
hostAliases:
- hostnames:
- kubernetes
ip: 127.0.0.1
hostNetwork: true
volumes:
- hostPath:
path: /etc/kubernetes/admin.conf
name: kubeconfig
status: {}
Most of these options should be relatively clear. More information can be found
in the docs.
Important to note are the
vip_arp
and bgp_enable
options. These configure how the address is made
known. Because I’m definitely not a networking wizard, I went with the simpler
ARP based approach.
Also worth noting is that I disabled the svc_enable
option, which can be
switched on to allow kube-vip to act as a LoadBalancer
for Kubernetes services
of that type. To reduce initial complexity, I will be working with ClusterIP and
NodePort services for now and look at LoadBalancer type services later again,
including things like MetalLB.
The final and most important config is address
. It determines which virtual IP
address kube-vip will advertise. In my case, I also added a DNS name for that IP
into my authoritative DNS server for easier access.
Kube-vip should be a static pod, so it can
run (more or less) outside Kubernetes. In my setup, this is necessary because
I will point kubeadm
towards the virtual IP during the setup of the actual
cluster, so kube-vip needs to work before the cluster is actually up and
running.
Initializing the cluster
All preparations finally complete, it’s time to get ourselves a Kubernetes cluster. As I’ve noted above, I’m using vanilla k8s with kubeadm. There are two ways to initialize the cluster and add additional nodes. First, using the command line. Second, using a kubeadm init config file. I will be going with the config file approach, to be able to put the initialization under version control.
Generally speaking, command line flags and config files cannot be mixed at the moment.
The documentation for the init config file can be found here.
There is no default location for the config file, so I just put them alongside
all of the other Kubernetes configs under /etc/kubernetes
.
And here is my init config:
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
skipPhases:
- "addon/kube-proxy"
nodeRegistration:
kubeletExtraArgs:
{% if 'kube_ceph' in group_names %}
node-labels: "homelab.role=ceph"
{% elif 'kube_controllers' in group_names %}
node-labels: "homelab.role=controller"
{% elif 'kube_workers' in group_names %}
node-labels: "homelab.role=worker"
{% endif %}
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
networking:
podSubnet: "10.20.0.0/16"
serviceSubnet: "10.21.0.0/16"
controlPlaneEndpoint: "api.k8s.example.com:6443"
apiServer:
timeoutForControlPlane: 4m0s
extraArgs:
authorization-mode: "Node,RBAC"
controllerManager:
extraArgs:
allocate-node-cidrs: "true"
clusterName: "exp-cluster"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: "systemd"
First, you can see the defaults used without any flags or config files by running
the kubeadm config print init-defaults
.
I’m actually not diverging very much from the defaults here. As you can see with
the if
s in the kubeletExtraArgs
, I’m using Ansible’s templating engine here
to assign roles to nodes in Kubernetes node labels.
Furthermore, I’m also disabling the kube-proxy
initialization phase. This is
due to the fact that I will be using Cilium as my Container Networking plugin,
and it can provide the proxy functionality, mostly concerned with Kubernetes
service handling, already. So I don’t want kubeadm to install kube-proxy
on
the nodes.
For the cluster itself, I’m also setting the service and pod CIDRs.
Important note: I’m using example values here, not my actual configs. If you see any weird inconsistencies between IP addresses or DNS names, please yell at me on Mastodon. 😉
The allocate-node-cidrs
option for the controllerManager is recommended by
Cilium.
Last but not least, the Kubernetes docs recommend to set the cgroupDriver
explicitly, which I do in the KubeletConfiguration
.
After this file has been defined, we can run the command to init the cluster on the first control plane node:
kubeadm init --upload-certs --config /etc/kubernetes/kube-init-config.yaml
Noteworthy here is the --upload-certs
flag. Without it, the initial certs
generated for the cluster will not be stored inside the cluster. As a consequence,
they won’t be usable for subsequent additions of more control plane nodes and
would require the execution of another command to generate a new set and upload
those to be able to add additional control plane nodes. By default, those certs
only have a TTL of 24 hours. So if you plan to add more control planes past
that point, you can skip that flag for now.
After this first node has been initialized, the next step is to copy the admin
cert to your workstation for use with kubectl
. You can find it under
/etc/kubernetes/admin.conf
.
To use this file, copy it to ~/.kube/config
.
And now you should be able to run your first command against your newly inaugurated
Kubernetes cluster, e.g. kubectl get all -n kube-system
.
Security note: This file contains the private key which gives you full access to the new Kubernetes cluster. Secure it appropriately. I will probably do another post once I’ve figured out what to do with it.
You will see a number of pods, mostly the Kubernetes control plane elements,
namely etcd
, kube-apiserver
, kube-controller-manager
and kube-scheduler
.
In addition, there should be a kube-vip
instance. If the kubectl get
command
fails, first check whether kube-vip
starts up correctly.
The kubectl config file we copied from the initial cluster node to the
workstation contains the address entered under the controlPlaneEndpoint
in
the kubeadm init config above. In my setup, that’s a DNS entry which points to
the virtual IP managed by kube-vip.
You will also see that the coredns
pods are currently still in PENDING
state.
That’s because CoreDNS, the default Kubernetes internal DNS server, only starts
up when a Container Networking Interface plugin has been installed. In our case
that’s Cilium.
Installing Cilium
As I’ve detailed the necessary preparations above, the only thing left to install Cilium is to run the install command:
cilium install --set ipam.mode=cluster-pool --set ipam.operator.clusterPoolIPv4PodCIDRList=10.20.0.0/16 --set kubeProxyReplacement=true --version 1.14.1 --set encryption.enabled=true --set encryption.type=wireguard
This command should be run on your workstation. Cilium will automatically
use the kubectl config file in ~/.kube/config
to contact the cluster and
install itself.
The clusterPoolIPv4PodCIDRList
is important here. Because while we already set
the Pod address CIDR in the kubeadm init config file above, Cilium does not seem
to have access to that and instead uses its internal default.
In addition, I’m telling Cilium here that it should act as a replacement for
kube-proxy
. Finally, I’m enabling pod-to-pod encryption with WireGuard. This
way, I don’t have to care about encrypting traffic between pods myself, e.g.
by configuring all my services to use TLS.
If the install command fails and the Cilium pods do not come up, check to make
sure that the preconditions I noted above are all fulfilled.
You should now see a single cilium-operator
and a cilium
pod in Running
state when you execute kubectl get pods -n kube-system
. Furthermore, the
CoreDNS pod should now also be in the Running state.
You can check whether everything went alright by executing cilium status
on
your workstation.
Joining remaining nodes to the cluster
For joining additional nodes, I went with a similar approach as for the cluster init, using a join configuration file.
apiVersion: kubeadm.k8s.io/v1beta3
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
{% if 'kube_ceph' in group_names %}
node-labels: "homelab.role=ceph"
{% elif 'kube_controllers' in group_names %}
node-labels: "homelab.role=controller"
{% elif 'kube_workers' in group_names %}
node-labels: "homelab.role=worker"
{% endif %}
discovery:
bootstrapToken:
token: Token here
apiServerEndpoint: api.k8s.example.com:6443
caCertHashes:
- "Cert Hash here"
{% if 'kube_controllers' in group_names %}
controlPlane:
certificateKey: "Cert key here"
{% endif %}
This file is a bit more unwieldy than the init config, because it also needs to contain some secrets. This wouldn’t be a problem if those secrets were permanent, I could just store them in Vault. But they are pretty short lived, so storing them and templating them into the file during Ansible deployments doesn’t really work. So I just input them into the file without committing the result to git.
When you run the kubeadm init
command, the output will look something like
this, provided you supply the --upload-certs
flag:
You can now join any number of control-plane node by running the following command on each as a root:
kubeadm join 192.168.0.200:6443 --token 9vr73a.a8uxyaju799qwdjv --discovery-token-ca-cert-hash sha256:7c2e69131a36ae2a042a339b33381c6d0d43887e2de83720eff5359e26aec866 --control-plane --certificate-key f8902e114ef118304e561c3ecd4d0b543adc226b7a07f675f56564185ffe0c07
The important part you don’t get without the --upload-certs
flag is the
--certificate-key
. This is required for new control plane nodes.
The values in this message fit into the JoinConfiguration
as follows:
- discovery.bootstrapToken.token:
--token
value - discovery.bootstrapToken.caCertHashes:
--discovery-token-ca-cert-hash
value - controlPlane:
--certificate-key
value
A fully rendered version of the JoinConfiguration
file above would look like
this, using the values from the kubeadm init
example output:
apiVersion: kubeadm.k8s.io/v1beta3
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "homelab.role=controller"
discovery:
bootstrapToken:
token: "9vr73a.a8uxyaju799qwdjv"
apiServerEndpoint: 192.168.0.200:6443
caCertHashes:
- "sha256:7c2e69131a36ae2a042a339b33381c6d0d43887e2de83720eff5359e26aec866"
controlPlane:
certificateKey: "f8902e114ef118304e561c3ecd4d0b543adc226b7a07f675f56564185ffe0c07"
If some time has passed since running the kubeadm init
command, the bootstrap
token and cert will have expired. You can recreate them by running the following
command on a control plane node which has already been initialized:
kubeadm init phase upload-certs --upload-certs
The output created will be similar to this:
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
supersecretcertkey
The last line is the new value for certificateKey
. The next step is generating
a fresh bootstrap token, as that is invalidated after 24 hours:
kubeadm token create --certificate-key supersecretcertkey --print-join-command
This will create a fresh join command you can use to join additional control
nodes to the cluster or enter into your JoinConfiguration
file.
Finally, additional worker nodes can be joined in a similar manner. Simply remove
the following lines from the JoinConfiguration
:
controlPlane:
certificateKey: "Cert key here"
Deleting a cluster
As I had to kill the cluster multiple times but did not want to completely reinstall the nodes every time, I also researched how to remove a cluster. The steps are as follows:
- First remove the CNI plugin, with
cilium uninstall
in my case - Starting with the worker nodes, execute the following commands on each node:
kubeadm reset
rm -fr /etc/cni
- Reboot the machine (this is for undoing the networking changes of the CNI plugin)
It is important to note the order here, always start with the worker nodes before removing the control plane nodes.
Final thoughts
First of all: Yay, Kubernetes Cluster! 🥳
This was a pretty vexing process. The research phase, before I set up a cluster for the first time, was considerably longer than for my current Nomad/Consul/Vault cluster. And I feel that that’s mostly due to the differences in the documentation. HashiCorp’s docs, especially their tutorials, for all three tools, are top notch.
Sure, if you follow the instructions in the docs for Kubernetes and Cilium, you will relatively reliably end up with a working cluster. But it just feels like there are a lot more moving parts. And some decisions you need to make up front, like choosing a CNI plugin and a container engine.
Don’t misunderstand me, having that choice is great. As I mentioned above, I’m a fan of apps that don’t have opinions on everything, so I can make choices for myself. But in HashiCorp’s Nomad, I can also do that. I even have greater choice, because I can decide per workload which container engine I want to use, and which networking plugin I want to use.
On the bright side, at least for now I have not seen anything I would consider a show stopper for my migration to Kubernetes. As this article was a bit longer in the making, I’ve just finished setting up Traefik as ingress a couple of days ago, and I’m working on setting up a Ceph Rook cluster now. Let’s see how this continues. :slight_smile:
Last but not least a comment mostly to myself: Write setup articles more closely
to the actual setup happening. I’m writing a lot of this over a month after I
issued the (currently 😇) final kubeadm init
. I’d made some notes on
important things, but I had not thought of copying the outputs from the
kubeadm init
or kubeadm join
commands to show how they’re supposed to look
like. I also did not think of making a couple of notes on the initial output of
some kubectl get
commands during the setup phase to show what to expect,
which I think might have been nice.
The next article in the series will be about day 1 operations, writing about how I plan to handle Kubernetes manifests for actual workloads in my setup.