Migrating to 3-node HA for Consul/Vault/Nomad

As mentioned in my previous post on migrating the Consul/Vault/Nomad servers from a VM to a Raspberry Pi, I was still waiting for some more Pis to arrive to extend the Nomad/Consul/Vault clusters to a HA configuration for all three. The main reason for this is not necessarily fault tolerance, but rather gaining the ability to restart the controllers without taking down the entire Nomad cluster.

Now I’d like to give a short overview of the experience, and end with a bit of an overview on the resource consumption (spoiler: Raspberry Pi 4 4GB are absolutely sufficient).

The sections on both, Nomad and Consul are going to be pretty short, as both ran fine and needed few adaptions to the configs. The Vault section is going to be the really interesting one.

On the order of doing the extension: I started out with Consul. This is due to the fact that both, Nomad and Vault can use Consul to discover other servers in the cluster. Plus, I’m using the Vault/Nomad Consul service discovery entries in a number of places. So having new Nomad/Vault server instances register themselves with Consul right off the bat was necessary.

One important note: Before launching this action, make sure that all access to your servers goes through some sort of load balancing, not through the DNS name or IP of your single server. I’m using Consul for this, accessing Nomad via nomad.service.consul, Consul itself through consul.service.consul and Vault via vault.service.consul.

I have not actually tested what happens when you just point your server access to a single controller. For Vault, going by the docs, this should work out of the box, because standby servers forward requests to the current master. But I do not know how exactly this works for Nomad or Consul.

Consul

For Consul to go HA, the changes were minimal. I made two changes in my server config files on all three nodes:

retry_join = ["controller1", "controller2", "controller3"]
bootstrap_expect = 3

The first line provides the three controller’s DNS names, so that each newly starting server knows who to connect to.

The second line expands the bootstrapping to three servers. Previously, this was only 1. This ensures that when the cluster went completely down, the three server instances would wait until contact was made to all three before starting to service requests.

Afterwards, it was just a simple systemctl start consul.service, and the two new controller nodes automatically got the cluster state from the one already running instance and then started running normally:

2022-12-14T22:39:13.376+0100 [INFO]  agent: Synced node info
2022-12-14T22:39:13.037+0100 [INFO]  agent.server.raft: Installed remote snapshot
2022-12-14T22:39:13.036+0100 [INFO]  agent.server.raft: snapshot restore progress: id=138-28922885-1671053952962 last-index=28922885 last-term=138 size-in-bytes=281985 read-bytes=281985 percent-complete=100.00%
2022-12-14T22:39:12.979+0100 [INFO]  agent.server.raft: copied to local snapshot: bytes=281985
2022-12-14T22:39:12.971+0100 [INFO]  agent.server.raft: snapshot network transfer progress: read-bytes=281985 percent-complete=100.00%
2022-12-14T22:39:12.969+0100 [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=3

And that’s it. If, for whatever reason, all three controllers are not available when the cluster is started cold, just reduce bootstrap_expect to 1.

There are two additional important points to take into consideration. First, if you’re using Consul’s DNS service discovery, make sure that your DNS is configured to take all three Consul server instances into account. I’m currently running PowerDNS, using the forward-zones-recurse config with the consul domain. But in the future, I plan to just launch a local Consul agent on my DNS machine and point the PowerDNS recursor to it. Second, also make sure that your Consul servers are listening locally, and that your Vault and Nomad servers do not do their Consul registration on consul.service.consul. This would work fine with a single instance. But with multiple instances, you might end up with multiple registrations for e.g. vault.service.consul, as the consul.service.consul domain always returns all three Consul servers. So it is random where e.g. Nomad registers its service - and it is similarly random where Nomad tries to deregister the service! Instead, best practice is to always do the service registration against the local Consul agent - be that a client or a server.

Nomad

For Nomad, enabling HA is even simpler. I only needed to up the bootstrap_expect setting to 3 and start up the additional Nomad server instances. No other changes necessary. Due to the fact that Nomad uses Consul for server discovery.

If you do not have Consul set up, you will have to add all of your Nomad servers to the retry_join config option. This has to happen on both, your server configs and your client configs!

Vault

Vault is the most complicated migration of the bunch. This is mostly due to the fact that I started out using the simple file backend. This backend does not support Vault’s HA functionality. When choosing a backend, you can check the Vault documentation for the different backends.

Because my backend does not support Vault HA, I first had to do a Vault backend migration. This is an officially supported process for switching backends. The docs can be found here.

I decided to switch to the Vault integrated backend, as it is officially supported by HashiCorp and proposed as the default backend for HA. One important note on the Integrated backend: Even though it is based on the raft consensus protocol, it also works with a single node. So you don’t have to worry about switching backends and having to enable HA right away.

The vault operator migrate command uses a special configuration file to facilitate the migration. Because I was switching from the file backend to the integrated backend, my file looked like this:

storage_source "file" {
   path = "/vault_storage"
}
storage_destination "raft" {
  path = "/new_vault_storage"
  node_id = "controller1"
}
cluster_addr = "https://controller1:8201"

The node_id config option is optional, and should be unique for each server instance. I ensured this by using each server host’s hostname. The cluster_addr also needs to be configured to the cluster address under which the local server’s HTTP(s) interface can be reached. Again, this is just the hostname for me. Two important points to note:

The Vault server should be taken offline during the migration
The migration does not automatically create the path directory. It needs to be created manually.

Once the config file is created, all directories are created and the server has been stopped, execute the migration command:

vault operator migrate -config /path/to/migration/config.hcl

Before restarting the server now, you need to make sure to also adapt the server’s config file. I removed the storage "file" section from mine, and added the following:

storage "raft" {
  path = "/vault_data"
  node_id = "controller1"
  retry_join {
    leader_api_addr = "https://controller1:8200"
  }
  retry_join {
    leader_api_addr = "https://controller2:8200"
  }
  retry_join {
    leader_api_addr = "https://controller3:8200"
  }
}
disable_mlock = true

As you might note, there is no bootstrap_expect config this time. But the raft storage backend (which is just another name for the “Vault integrated storage”) requires setting of retry_join. I just hardcoded my three servers in here, but other configs, including some cloud provider’s APIs, are also supported here. The disable_mlock is a recommended configuration when using the raft backend.

After that, I just restarted the currently still single server instance, which went through without a problem. Don’t forget to also unseal it at this point!

Then I went forward and configured the two other Vault servers. Remember to adapt the node_id config option for each server instance.

After launching the other two servers, I was greeted with a lot of error messages along these lines:

Vault is sealed

In HA Vault, each Server instance still needs to be unsealed individually. Just running operator unseal on the currently active Vault server is not enough. There is an auto-unseal feature. But it requires additional components and is geared more towards large scale setups on one of the hyperscaler’s clouds.

Once I unsealed all three servers, Vault got up and running properly without further problems.

In addition to the above, I also started to make use of Vault’s Consul service registration. The docs can be found here. This functionality is only available with a HA capable backend, for some reason. With it configured, I no longer needed my homemade Vault service config file, which I fed to Consul. The config looks like this:

service_registration "consul" {
  address = "127.0.0.1:8501"
  token = "your consul token here"
  scheme = "https"
}

The important point here is to use the local Consul agent. Do not use Consul’s DNS service discovery via consul.service.consul, because that returns all Consul servers, meaning that the Vault instance might register with one Consul server, but try to deregister from another. This will leave behind stale Vault service registrations.

Performance

Finally, a short word on resource utilization: HA carries some costs with it. I’m running the Vault/Nomad/Consul servers on three Raspberry Pis. The total CPU consumption after enabling HA for the three servers increased by 1-2%. Not too significant, but measurable.

Consul#

Nomad#

Vault#

Performance#

Consul

Nomad

Vault

Performance