This page is an overview of what is currently running in my Homelab, and what’s it all running on. There are not going to be large amounts of details about the “how”, but where I’ve already written an in-depth post, I will link it from here.
I will update this page whenever I make any major changes, to hopefully keep it up-to-date with the current state of the Homelab. Provided I remember to actually do it. 😉
Why?
So what are the goals of my Homelab? Mostly self-hosting things. It all started with “I need a convenient place to store more Linux ISOs”. And instead of doing the smart thing and investing into an external disk, I decided to buy a used server.
During the lockdowns of the corona pandemic, I really got back into it and went from a single server to a 19" rack with a lot of Raspberry Pis and a number of other machines.
I’m working on a “Homelab History” series of posts where I will go a bit more into the details especially of the “why” of the different iterations of my Homelab.
Hardware
My hardware philosophy for physical hosts in the Homelab went from “single server” to “lots and lots of SBCs”. The main driver for that was my wish to simplify host updates and reboots. By having enough physical hosts, I could just take them down one-by-one without having to take down the entire Homelab. This goal could, to some degree, also be achieved by a good VM setup and perhaps fewer, but larger hosts than mine - but having a bunch of smaller machines appealed to me.
The Rack
Most of my Homelab is located in a StarTech 24U 4-post rack.
The top tray holds a small assortment of hosts and my external HDD backup disk. The two servers in the middle are the Turing Pi 2 boards with a total of eight Pi CM4. The two servers in the bottom are storage servers, with one HDD and one SSD each.
The desktop case to the right houses my old home server, which I reactivated for extra resources during the migration to Kubernetes.
The idea to rack everything up came relatively late in my “spread out the Homelab” plan, after I set up a couple of machines in separate desktop cases. It ended up looking like this:
And now I wanted to add another two desktops for my other two storage nodes. I decided that going with a rack was just the right thing to do at that point.
Physical hosts
Let’s start with some special form factor hosts. They’re mostly odds and ends which I gathered over the last couple of years and integrated into the Homelab.
Command & Control
I wanted to have control over my Homelab from everywhere, but at the same time I did not want to spread the necessary credentials over too many hosts. So instead of doing that, I decided to introduce this host, my Command and Control host. It doesn’t host anything itself, but instead just serves as the host which has all the necessary credentials and configs and network permissions to access everything. If I then need to do something remotely, I just need to log in, instead of having to remember to sync e.g. k8s certs over to all my machines.
The machine itself is a PC Engines APU2e4. I originally bought it to run OPNsense, but I then found out that there’s an issue with PPPoE, which I use for Internet connectivity, and FreeBSD. In short, it needed more power than this machine had. So instead of throwing it out, I repurposed it.
It has the following internals:
- AMD GX-412TC SOC, 4C/4T 1 GHz base clock, 1.4 GHz boost
- 3x Intel i210 1 GbE NIC
- 4 GB RAM
- 16GB SATA SSD
OPNsense router
My Router/Firewall is running OPNsense. It is running on an NRG Systems IPU 672, again more of an “appliance” than anything else. I mostly used it because it was passively cooled and had enough NICs. And in contrast to the previous host, I was confident that its CPU would be able to route my internet connection without issue.
It has the following internals:
- Intel Core i5-7200U 2C/4T 2.5 GHz base clock, 3.1 GHz boost
- 8 GB RAM
- 64 GB mSATA SSD
- 6x Intel i211-AT 1 GbE NICs
Storage hosts
I have a total of three x86 machines which serve as my storage hosts. Right now, I’ve also got a fourth which also hosts another couple of disks, but that’s only temporary, more on that later.
My three storage hosts are quite the mix of machines. The first one is an Odroid H3. It has the following stats:
- Intel N5105 4C/4T 2GHz base and 2.9 GHz boost clocks
- 16 GB RAM, 2x 8GB sticks
- 2x 2.5 GbE NICs
- 1x Crucial P2 250 GB NVMe SSD for root
- 1x Samsung 870 EVO 1TB SATA SSD
- 1x Seagate Ironwolf 4TB HDD
I initially planned to buy more of these, but in the end I realized that the CPU could easily handle more drives in Ceph, but I was running out of SATA connections as well as drive power supply connections on the board. In the end, I decided to go with standard form factor machines for storage, for the sake of better extensibility.
Next up is an old machine. This one was my old Home server, from before it all turned into a Homelab. It spend a couple of years in the basement, but now serves as a storage host again. Its parts were still good, and its CPU is still perfectly fine for some storage work. I’ve got a more extensive post about it here. It served as my homeserver from 2018 to 2021.
It currently has the following innards:
- AMD A10-9700e 4C/4T APU with 3 GHz base and 3.5 GHz boost clock
- 16 GB of RAM
- MSI A320M Pro VD/S
- Kingston 240 GB SATA SSD for root FS
- 1x Samsung 860 EVO 1TB SATA SSD
- 1x WD Red 4TB HDD
The final storage host is made up mostly of new parts. I went for a low TDP CPU and again one HDD and one SSD.
The machine has the following specs:
- Intel i3-12100T 4C/4T 2.2 GHz base clock, 4.1 GHz boost clock
- MSI Bazooka B660 mainboard
- 32 GB RAM
- Crucial P2 NVMe 250 GB SSD
- Samsung 1 TB 860 EVO SATA SSD
- WD Red 4 TB HDD
The fleet of Raspberry Pis
I started my Homelab with a single server. After some time, I got a bit annoyed by the fact that during host updates, I had to shut down everything. I ended up deciding to switch my entire Homelab to a fleet of smaller machines, so that I would be able to restart any of them without having to shut down the entire Homelab. At the time, the best choice for something with low power draw, good software support and a small physical footprint seemed to be the Raspberry Pi 4. I’m now the proud owner of 13 of them. Five Raspberry Pi 4 8 GB boards, and eight CM4, also in the 8GB variant.
Here are the five SBCs:
Each of the Pis has a Kingston 250 GB SSD connected via a USB adapter. I can recommend the StarTech USB to SATA adapter. I’m using it on all five Pis and never had a problem with it.
The Pis are mounted in the Racknex UM-SBC-207.
The Pi CM4 are mounted in Turing Pi 2 boards.
The great thing about these boards is that they’re following standards. They have the ITX form factor and take power from a standard ATX power supply. They also allow me to put quite some compute into a relatively small space. I wrote about the boards a bit more extensively here.
None of the Pis on these boards has any storage at all, not even an SD card. They all netboot and use a Ceph RBD for their root FS. I’ve written a detailed series about this setup here.
The token x86 host
As I’ve noted above, my main cluster hosts are all Raspberry Pis, which have ARM CPUs. To ensure that I don’t run into any issues due to some software I’d like to run not having ARM support, I also bought a small x86 host, the UDOO X86 II ULTRA.
This small machine also does not have any storage and is netbooted as well. I’ve written a bit more about it here.
The k8s expansion host
At the time of writing, I’m migrating my Nomad cluster to k8s (more on the software stack will come later). To be able to do the migration incrementally, I reactivated my previous home server. It’s now running another Ceph node as well as the k8s control plane and two worker VMs. It will be removed once I’m done with the migration to k8s.
The machine has the following specs:
- Intel i7-10700 8C/16T 2.9 GHz base/4.8 GHz boost clock
- Gigabyte W480M Vision W mainboard
- 64 GB RAM
- Seagate Ironwolf 8 TB (Ceph)
- Crucial MX500 2TB (Ceph)
- 2x Samsung Pro 512 GB (Root FS, VM disks)
Odds and ends
My entire Homelab is currently running on 1 Gbps LAN. I have two switches for this, both from Netgear:
- Netgear GS108E
- Netgear GS116Ev2
Both of these are part of Netgear’s “Plus” series of Gigabit switches. Those are “semi-managed”, meaning they have a couple of interesting features compared to unmanaged switches. The most important to me is VLAN support. I’m generally quite happy with them, but I will need an update soon, as I want to switch to 2.5 Gbps LAN at least for some hosts.
The 16 port switch serves as my main Homelab switch, and the 8 port one is sitting under my desk for use by my desktop and some of the following networking equipment.
My WiFi access point is a TP-Link Archer C7 I flashed with OpenWRT.
For ZigBee communication with a couple of temperature sensors around my flat, I’ve got a TubesZB CC2652P2 based Zigbee to Ethernet/USB Serial Coordinator.
Operating system
I standardized on Ubuntu Server a long time ago. I was going to use Debian at first, and had most of my machines on it already, but then I started integrating the first Raspberry Pis, and for some reason Debian did not work properly. It had some issues with being really, really slow. I’m reasonably happy with Ubuntu for now. At this point, the baremetal OS isn’t doing very much anyway.
Host creation
For my hosts, be they physical machines or VMs, I’m creating images with HashiCorp Packer. Packer downloads an Ubuntu Server image and then either launches a Qemu VM (for x86 hosts) or a chroot with qemu-static (for ARM hosts). Then it applies a relatively short Ansible play against that image to do some basic configuration, e.g. configuring SSH. Then I either use an USB stick to install the image or I put it onto a Ceph RBD volume, depending on whether the host will have internal storage or netboot.
This setup has allowed me to relatively efficiently image a lot of hosts in a relatively short time. Some details on this setup can be found in this post.
Host management
For managing hosts, I’m mostly using Ansible. I have
two major playbooks I’m regularly using. The first one is just called deployment.yaml
.
It contains everything. If I ran it against my Homelab hosts with completely
clean disks, I should end up with the same Homelab I’ve got now.
The config in this playbook ranges from basic things like Prometheus
node_exporter
configs up to installs of Kubernetes or Nomad. I barely ever
run the entire playbook on my Homelab anymore. Instead I run specific tags or
for specific hosts. But I do regularly run the entire playbook in --check
mode, to see whether anything drifted away from the defined configuration.
I find Ansible a really useful tool. And I also like having my host configs under version control. It makes it a lot easier to solve those “huh, why did I do this” mysteries. The only thing which took some getting used to was to not run any commands on the hosts themselves anymore, but to always go through Ansible.
VM management
Due to the ongoing migration to Kubernetes, I’m currently running a number of Virtual Machines, hosting most of the k8s cluster. I wanted their configuration under version control too. I’m using LXD to manage them, and I found the LXD Terraform provider to work quite well.
Storage
I’m running most of my storage off of a three-node Ceph cluster. Ceph is a great way of running storage, because you can provide pieces of the same disk as block devices, a POSIX compatible file system or S3-compatible object storage.
I’m using Ceph for all three. My fleet of Raspberry Pis is completely without storage attached to them. They netboot, and then have their root disks provided as Ceph RBDs. I’ve also got a large CephFS, which I mostly use for mass storage. You know, Linux ISOs and such. Then I can mount that on my desktop and in my Homelab, connected to Jellyfin. And wherever I can, I’m using S3 as the storage method. I just love that it’s a vast lake of storage, where I don’t have to worry about things like filesystems running full.
Both my Nomad and my k8s cluster use Ceph for provisioning volumes for jobs through the Container Storage Interface spec.
I’ve normally got three hosts in a cluster. Right now, due to the k8s migration, I only have two storage hosts in the baremetal cluster. Each of them has a 4 TB HDD and a 1 TB SSD. I’m running all of my Ceph pools in replicated mode, so that each piece of data is stored on two different hosts.
Due to the k8s migration, I’ve currently also got a Rook Ceph cluster. This will at some point replace my baremetal/cephadm cluster and I will migrate all of my current baremetal hosts into the k8s cluster.
For a more in-depth description of Ceph, have a look at this post, and for a description of my Rook Ceph setup, look here.
The Nomad cluster
My HashiCorp Nomad cluster is currently my main workload orchestrator. It is using HashiCorp Vault for secret storage and HashiCorp Consul for service discovery and mesh networking.
I have three server instances of Nomad/Vault/Consul running on three Raspberry Pi 4 8G, which has proven to be more than enough CPU and RAM to run all three comfortably.
The majority of the clients attached to the cluster, each running a Nomad and a Consul client, are also Raspberry Pi 4 8GB, with my one Udoo x86 host as an exception, in case I ever meet a workload without ARM support.
The Nomad cluster currently uses Docker as the underlying container runner. I’d have liked to get rid of it, but none of the alternatives supported all of the features I got so used to.
I’m extremely happy with this setup. All three HashiCorp tools are well-made, and the documentation is top-notch. I’ve never had serious issues with it.
I initially decided to use Nomad/Consul/Vault because Kubernetes looked too daunting. I basically read the k8s install docs up to the point where you’re supposed to chose a CNI plugin, and not only did I not know which one to chose - I even failed to come up with a way to make the choice. Thus turned off, I went to Nomad, and even with the additional load of Vault and Consul, I found it to be a lot more accessible.
With all of that being said: I’m moving away from Nomad and Consul at least. I still have to look closer at Vault. I will replace Nomad with k8s. The reason lies in their switch to a “Source Available” license for all of HashiCorp’s products back in 2023. I didn’t feel comfortable with that, and then the decision was made for me by some changes in the ToS for the Terraform Registry, excluding the open source Tofu fork of Terraform from using it. That felt a lot like spite to me, and killed the remainder of trust I still had in HashiCorp.
I’m now in a long term project of migrating my cluster to Kubernetes. If you’d like to read about it, start here and look at all of the posts under the k8s-migration tag.
Secrets with Vault
For the secrets management in my entire Homelab, including the Nomad and k8s clusters as well as my Ansible playbooks, I’m using HashiCorp’s Vault. As far as secrets management goes, it is pretty accessible piece of software. I’ve got three servers running on three of my Pis. Its integration into both in Nomad and k8s is great. For managing the many different policies and roles I need to properly separate the different Nomad and k8s workload’s secrets, I’m using the Vault Terraform provider, so that I don’t need to remember the right command line incantations to do a thing every time I need to do it.
Networking with Consul
With many different hosts running many different containers which need to talk to each other, some sort of service discovery is necessary. And there’s always the question of encrypted traffic between containers. For both of these, I found Consul to be a great solution.
I need to run an agent for it on every Nomad cluster host. Nomad then automatically feeds the Consul cluster with information on which job is running where, and an Envoy instance is started with every job which is running with the Consul service mesh support. With this support, the job’s container ports are not directly exposed on the host, but only through the Envoy proxy. Consul generates an mTLS certificate for each of the services defined with Consul support. The Envoy proxy running in front of every service then only allows connections which come in with the correct certificate. Consul allows detailed configuration of which service is allowed to talk to which other service. All traffic between the services is then routed via their pair of Envoy proxies, mTLS encrypted. This way, I don’t have to muck around with providing each service with the correct TLS certificates myself. Let alone with setting up some sort of mTLS infrastructure.
The service discovery just works out of the box via the Nomad job definition for services running in the Nomad cluster. But in addition, I’m also using Consul’s DNS server for getting the IPs for some of the services which provide public ports, to serve as ingress into the cluster. This is currently only the Traefik instance running in the cluster.
Like for Vault, I’m also using the Terraform provider for Consul to do some of the configuration which would otherwise be done via the command line interface. The main usage for this is to define which services can actually talk to which other services.
Storage with Ceph
For storage for my Nomad jobs, I’m using Ceph. Nomad has implemented the Container Storage Interface specification. And because of that, the official Ceph CSI drivers work without issue in Nomad as well, although they officially only support Kubernetes.
The Kubernetes cluster
As I’ve mentioned above, I’m not only the proud owner of a Nomad cluster for workloads, but similarly the proud owner of a Kubernetes cluster.
For the ongoing series of articles on the migration, have a look under the k8s-migration tag.
I went with the official kubeadm distribution, mostly because I like “vanilla” editions.
Besides the main driver in my decision, there’s one thing I hope to improve about my Homelab with the migration: Making better use of my relatively beefy storage hosts for general usage. Right now, they’re only running the baremetal Ceph cluster, which uses only a small portion of their compute resources. With the k8s cluster, I will run Ceph via Rook, which runs all of the containers for Ceph via Kubernetes. So while the hosts will still be a bit special, they will also be used for general workloads, not just Ceph.
Networking with Cilium
I had mentioned above that when I decided to go with Nomad for my initial Homelab cluster, it was partially because of the complexity of Kubernetes, and in particular the choice of a Container Networking Interface plugin. I hit that same point again when setting up my k8s cluster. And still not even knowing how to choose, let alone what to choose, I went with a simple method.
Principal Lead Architect Dice decided that I should be using Cilium as my CNI plugin.
This proved to be a good decision, and I have had no serious issues with it.
I’m not just using it for internal cluster networking, but also use it to provide LoadBalancer functionality for Services via BGP. For details on the setup, see this post.
Software
Logging
Let’s start with my logging setup. My main logging aggregator is FluentD. It’s job is to receive logs from a number of sources, and parse them a bit. The end goal is to parse out the following components from every log line:
- Date/Time
- Log level
- Log message
These are parsed from every line. Once the line has been parsed, the record is forwarded to a Loki instance. This instance is used for long term storage and querying. It uses S3 from my Ceph cluster for said storage.
The log gathering is done on each individual host. The host’s own journald logs are collected by syslog-ng and then forwarded verbatim to my FluentD instance. My OPNsense, OpenWRT and even my DSL modem also forward logs to FluentD. Logs from my Nomad jobs and k8s pods are gathered by Fluentbit and also forwarded to the central FluentD instance.
For more details on the logging setup, have a look at this post.
Metrics
I love graphs. I mean, I really really do. I think the only internal website I visit more often than my Grafana instance is Mastodon.
For my metrics stack, I’m using Prometheus and Grafana. Both are currently deployed via the kube-prometheus-stack. I do not endorse this solution, due to their utterly atrocious release engineering. Or rather the utter, complete lack thereof. But it’s working for me for now.
I’m scraping mostly my infrastructure at this point. Adding scraping of individual apps is still on my list. But even that is already a lot, including both k8s and Nomad, all of my hosts with node_exporter and even my DSL modem’s SNMP. In addition, I’m also scraping some smart plugs and Zigbee thermometers.
For details on the setup, see this post for Prometheus and this one for Grafana.
Harbor for container image storage
To store container images, both for caching of outside images and for my own images, I’m using harbor. It has an okay GUI and supports my use case pretty well.
For an article on the setup, see here.
External-secrets for secrets handling via Vault
When starting my k8s migration, I already had HashiCorp’s Vault setup for all manner of secrets in my Homelab, so I decided to continue using it. Luckily, I found external-secrets pretty quickly. It was easy to setup and allows me to continue handling my secrets through Vault. It has the ability to fetch secrets from Vault and creating Kubernetes Secrets from them, including some nice templating. That templating gets rather funny when you’re deploying a templated ExternalSecret via Helm. 😅
I’ve documented my setup here.
external-dns for DNS handling
DNS handling, at least for my services running inside my cluster, is pretty simple: All of the different DNS names need to point to the IP of my Traefik instance. I did that with PowerDNS and Terraform for a long time. But with k8s, I can now do it from inside the cluster. Still using the PowerDNS API, but now using external-dns to do it from inside the cluster.
CloudNativePG for Postgres databases in k8s
I’m using PostgreSQL as my DBMS of choice. In my Nomad cluster, I’m running it as a job, with a single instance shared between all jobs which need a database.
In my k8s cluster, I went with CloudNativePG. So instead of manually managing a single instance, I’ve now got per-app manifests which define a separate Postgres database instance, including secondaries and automated backups. I don’t have many apps using it migrated to k8s yet, but at least up to now it looks pretty stable.
The documentation of the setup can be found here.
Ingress with Traefik
For my reverse proxy, I’m using Traefik. In my Kubernetes cluster, it’s deployed via the official Helm chart, and in my Nomad cluster as a plain Docker container. I especially like it for its versatility. It supports a wide range of providers, so that manual configuration should be kept to a minimum. I quite like the fact that with Traefik, I can do the configuration for the reverse proxy close to the rest of the app configuration. For Nomad, I’m using the official Consul catalog provider. It automatically reads the information about the backends from Consul and due to that, it also enables/disables routes depending on Consul’s health checks. If Consul removes a service because the health checks fail, Traefik will automatically disable the associated routes.
For the k8s deployment, I’m using the Kubernetes Ingress support for apps which come with their own Helm charts and Ingress manifests. For my own Helm charts, I’m using Kubernetes IngressRoutes. Those are Traefik’s own type of Ingress definition, which allows configuring Traefik options for the routes in YAML instead of as labels on the Ingress.
Details on the k8s deployment can be found in this article.
Audiobookshelf for podcasts
Audiobookshelf is an absolutely great app for listening to podcasts and audio books. It can be used via its own web UI and the player integrated there, or via its Android app.
It works with podcast’s RSS feeds.
More details on my setup can be found here
Wallabag for read it later bookmarking
Wallabag is a read it later app for bookmarking articles and websites. I’m using it a lot, and it has a good Firefox extension as well as an Android app.
Paperless for digital document handling
Paperless is an app for digital document handling. I’m using it to finally get rid of all the paper one has to content with. Anything that still arrives in dead tree format gets scanned in. Paperless then runs OCR on it and adds some fitting tags via machine learning. The Web UI allows viewing of documents as well as searching by type, tags or just over the entire OCR’d content.
Nextcloud for file sharing, calendaring and contacts
Nextcloud is one of the apps which started off my Homelab, still as OwnCloud back then. I’m using it mostly for convenient file sharing between my different devices, but also for calendar and contacts.
Recently, I also started using it for note taking on my phone, to have those shared on my own infrastructure, instead of Google’s.
Mastodon for social media
Mastodon is my social media home. It’s one of many apps which connects to the Fediverse.
I’m loving it.
Jellyfin for movies and TV shows
I’m using Jellyfin for organizing and watching my media library. It’s been serving me well over many years, and runs perfectly fine from a Raspberry Pi, although without HW acceleration. So if you want to do media trans-coding, you will want to look for something a bit beefier.
Keycloak for single sign-on
To stop having to have different accounts on all of my Homelab apps, I’m running Keycloak. It is currently connected to my Mastodon, Grafana, Gitea and Nextcloud instances. In addition to SSO, it also provides me with 2FA, via FreeOTP.
I’ve described my SSO setup in detail here
SNMP exporter for DSL modem metrics
A while ago, I found out that my DSL modem exposes some metrics via SNMP. I’m using SNMP exporter to get that data into my Prometheus instance.
Gitea for Git hosting
I’ve got a veritable graveyard of projects. To store all of them and have a single place to despair over how I’m not getting ahead with any of them, I’m using Gitea. It’s a good piece of Git hosting software, but recently went a lot more corporate. I’m already planning to switch to the community lead Forgejo project after migrating Gitea to the k8s cluster.
Zigbee2MQTT for connecting my thermometers to MQTT
The Zigbee2MQTT software can be used to connect to a variety of Zigbee devices and translates their data points to MQTT format and then sends them to an MQTT broker.
Docker-mailserver as an internal SMTP server
I’m running an instance of docker-mailserver internally to provide apps like Mastodon or Gitea with the ability to send mails. At the moment, this only serves as a relay for outgoing mails which transmits the mails to my DNS hoster’s mail server which then does the actual delivery.
Uptime-kuma for service monitoring
To check whether everything is up and running in my Homelab, I’m using Uptime-Kuma. It’s a nice solution, which has a variety of checkers, from HTTP over DNS to TCP.
I’ve got it set up to check a lot of things, including whether my internet is up, all of my apps are reachable or DNS is working.
This is the one piece of software I would like to move out of my Homelab, or at least to set up a secondary instance somewhere else, to check whether my public services are reachable from the outside. Because if e.g. my Nomad cluster is down, I the Uptime Kuma instance is also going to be down, making it infeasible to use it to debug problems with my core infrastructure.
Mosquitto as an MQTT broker
As an MQTT broker, I’m using Mosquitto. It’s collecting MQTT messages directly from my smart plugs and indirectly via Zigbee2MQTT from my thermometers.
I’ve written about the Mosquitto and smart plug power measurement setup in this post.
Drone CI for CI
I’ve got Drone CI deployed as a CI solution. It’s mostly used for building some internal Docker images, but also runs some automated tests and linters for my private projects. I’m using this instead of Gitea’s internal CI functionality because I set it up before Gitea got its own CI functionality.
I’m planning to switch to Woodpecker as part of the k8s migration. Mostly because Drone CI looks half-inactive these days and Woodpecker is a proper, community-lead open source project.