I am currently working on distributing my Homelab a little bit more. My main driver is high availability. Do I need high availability in a homelab setup? No, not really. But I was getting annoyed by having to take down the entire Homelab whenever I was doing an update on my single server.

The newest part of that project is my cluster controller. That is the machine running the servers for my Vault, Consul and Nomad cluster. Before the migration, this was yet another LXD VM on my homeserver. Now, it’s a Raspberry Pi 4 4GB.

This single Raspberry Pi will be joined by two more, so I will end up with three instances of each of the servers. With that, I can finally reboot machines to my heart’s content without having to worry about currently running jobs. 🎉 But this high availability nirvana is currently withheld by Amazon/DHL being unable to ship me my stuff. Seriously. For years, not a single problem with delivery. But now all of a sudden, nothing arrives.

But enough venting. With time, it will all find it’s way to me. So for now, enjoy this story of my migration. It will contain a lot of nerdery, large amounts of exhilaration and, surprisingly, not a single “Darn it. I will have to re-image my entire homelab” moment.

Preparations

This was actually a hefty part of the time I spend on this. Last weekend, I sat down with all my Homelab repos and grep, because I had used both, the single cluster controller’s IP and hostname throughout my Ansible files, Terraform files and just general scripting. That was remedied by redirecting (almost, more later) everything to each service’s Consul supplied DNS entry, e.g. nomad.service.consul or vault.service.consul.

With that taken care of, I read through all three tool’s HA/multi-server setup documentation. The most straightforward (I thought…) was Vault, which doesn’t actually have a clustered mode, but only failover with some capability of forwarding requests to the current leader from the standbys. Both Nomad and Consul were a bit more complicated, in that they fully support multiple active servers.

Finally, I imaged the new Pi with Packer and Ansible to get it ready. To my surprise, this time around the entire playbook ran through completely without my intervention. No fixes necessary at all. 🎉

I made one important change to the Ansible scripting: I disabled all automatic enabling/starting of the three tools, because I wanted to migrate them one-by-one. And sadly the OS I use, Ubuntu Server, just autostarts services right after installation. And as far as I read, there is no reliable way to turn that off, because it depends on each package’s maintainer. Just to make sure, I masked the systemd service files for the tools:

systemctl mask nomad.service consul.service vault.service

This also works when the tools/service files aren’t even installed yet.

One thing to look out for: What I’m describing here was the full migration, done mostly on two evenings. Running any of these tools with only two servers is strongly advised against, as you will be running the risk of “split-brain” symptom. This is when there are only two instances of something, and they start disagreeing on what reality looks like. I had Consul running for about a day with two servers and did not see any problems, but HashiCorp strongly advises against doing that for extended periods of time.

Finally, a side note: I decided to do the migration while the Homelab was still up, but chickened out when it came to the Nomad server at the end.

Consul

Consul was the first migration target because it was needed on the new host to ensure that the two other servers were made visible via DNS.

The first step was changing two important configurations in the new server, to make sure it can cleanly join the old server:

#bootstrap_expect = 1
retry_join = ["oldserver.foo", "newserver.foo"]

The important part is commenting out bootstrap_expect, so the new server doesn’t automatically elect itself as the cluster leader.

Then I just started the Consul server after unmasking it:

systemctl unmask consul.service
systemctl start consul.service

And I was actually off to the races? This just worked out of the box. The two servers were immediately connected. They also seemed to (voluntarily) switch the leadership role between themselves. All data was immediately shared between them, and I was able to point my DNS server to the new server for .consul queries immediately.

I had to keep the old Consul server running for now, because it was also serving as the Consul agent for registration purposes for the Vault and Nomad servers on the old cluster controller.

Then, I restarted all the Consul agents on my clients, to update them with a new retry_join value containing only the new Consul server. Also worked without any problems.

One thing to look out for: Make sure to allow traffic to the new server from all necessary network segments in your firewall.

I finally shut down the Consul server as the last service on the old cluster controller via this command:

consul leave

For this to work, you need to have a management token set in CONSUL_HTTP_TOKEN. This makes the server gracefully leave the cluster, without creating any problems due to a missing quorum with only one of two servers remaining.

Vault

This was the “fun” one. First of all, this data was really important to me. Then, I had also completely misread the Vault HA documentation. In the docs, it is very clearly stated that HA mode needs a compatible backend. It also talks a lot about the Vault Integrated Storage, which uses HashiCorp’s raft protocol for distributed storage of the secrets store. And for some reason, I thought “yeah, this is the recommended default, I’m pretty sure I did not stray from that when I set Vault up”. But of course, I had configured the simple file storage.

So now I had a problem: I did not have a HA compatible backend configured, and I did not actually intend to start a HA setup right away - just migrate my current single-server setup to a new host. What also frustrated me: I was not able to figure out whether the Raft backed actually supports running in a single server setup. I will have to dig a bit deeper on that. Luckily, Vault already has support for migrating from one storage backend to another, vault operator migrate, documented here.

So setting up two servers, automatically syncing them and then shutting the old one down would not work. Instead, I just shut down the old one, copied everything over to the new server, and started it up. This worked nicely, as far as the data was concerned.

But I had overlooked one important point. I do all my Homelab controlling via a separate machine, with a lot of my services only allowing admin actions from that machine, as a security measure. And as mentioned previously, I switched the vault hostname to vault.service.consul. But: Consul only answers DNS queries for healthy services. And Vault reports as unhealthy as long as it is not unsealed with vault operator unseal. This command decrypts the secrets on the disk. When I tried to unseal my new Vault instance, it failed because it was not able to access vault.service.consul. So on my command and control host, I now access Vault with the new host’s hostname, instead of the .consul address.

All in all, a bit more turbulent than the Consul migration, but still okay. I did this migration live, and after a couple of minutes, some of my services became unavailable, but came back immediately once the new Vault instance was up and running.

Nomad

Finally, “the big one”. For Nomad, I chickened out and decided to take down all jobs first, just in case it went badly wrong.

For Nomad, I luckily did not need to update all of the Nomad client’s config. Nomad can use a local Consul client to discover Nomad servers. This works for both, Nomad clients and Nomad servers.

The migration itself went well yet again. I only commented out bootstrap_expect = 1 again, to make sure the new server did not make itself the leader by default. And yet again, the Raft state was transferred pretty quickly.

The problems started when I wanted to shut down the old server. This did not work at all. When left to its own devices, the new server was just spewing failed leader election errors:

"2022-12-07T23:41:58.875+0100 [INFO]  nomad.raft: entering candidate state: node=\"Node at 42.42.42:1234 [Candidate]\" term=640"
"2022-12-07T23:41:58.879+0100 [ERROR] nomad.raft: failed to make requestVote RPC: target=\"{Voter uuid-number-here 42.42.42.45:1234}\" error=\"dial tcp 42.42.42.45🔢 connect: connection refused\""
"2022-12-07T23:42:00.553+0100 [WARN]  nomad.raft: Election timeout reached, restarting election"

First, I reached for nomad server force-leave:

nomad server forced-leave oldserver.global

This did not have any effect at all.

Then, I tried nomad operator raft to go down a layer:

nomad operator raft remove-peer 42.42.42.45:4646

This also did not work, but I later realized that I might simply have used the wrong port.

What finally worked was using the leave_on_interrupt configuration option. The leave_on_terminate option also did not work. So what I ended up doing was adding leave_on_interrupt to the old server’s config and starting it again.

leave_on_interrupt = true

And then shutting it down again immediately:

systemctl start nomad.service
systemctl stop nomad.service

Now the old server was removed correctly, and the new server, being the only one left, elected itself as leader as expected.

Final thoughts

So that’s it. The migration went far faster, and with far fewer problems than I expected. The new cluster controller has been up for several days now, and I do not observe any problems at all.

One tip: When you want to make sure that you don’t accidentally start a service on a machine, use systemctl mask on that service.

The Pi itself is making a good impression as a cluster controller as well, with less than 5% CPU utilization on average and total memory consumption of about 500 MB for all three servers put together.

The next steps will consist of setting up the other two Pis and configuring all three servers for HA. In addition, the three Pis will also serve as MON and MGR daemons for my Ceph cluster. Due to the Amazon/DHL delivery SNAFU, I will probably not be able to set up the full HA implementation with three hosts this weekend. 😢