In the course of spreading my homelab over a couple more machines, I finally arrived at the Ceph cluster’s MON daemons. These were running on three Ceph VMs on my main x86 server up to now. In this post, I will describe how I moved them to three Raspberry Pis. While the cluster was up the entire time.
First, a couple of considerations:
- MON daemons use on average about 1GB of memory in my cluster
- My cluster, and most of my services, went down during the migration. So please be cautious if you plan to do your own migration
The MON daemons are something of a control plane for Ceph clusters. They hold the MON map of daemons and data locations. Every client which uses the Ceph cluster will use them to access a map of available OSDs to work with.
Please Note: Be cautious with this! If you lose all three of your Monitors, your cluster is broken.
Due to the centrality of the MON daemons for both, the cluster itself and any
clients, a lot of places potentially hold the IPs of your monitors. Most of the
time, that will be in the form of ceph.conf
files.
Clients are generally not automatically receiving new MON addresses. They will need to be updated manually!
So how did I do it all? I started out with migrating a single daemon. My thinking here: I can migrate one daemon, then update all three MON’s addresses to their new values everywhere, and then I can migrate the other two daemons as well.
For the sake of this article, let’s assume that the old MONs are located on
oldhost1,oldhost2,oldhost3
and the new hosts are called newhost1,newhost2,newhost3
.
Also note that I’m running a cephadm
cluster.
So to begin with, a single daemon can be migrated by using the ceph orch apply
command:
ceph orch apply mon --placement "newhost1,oldhost1,oldhost2"
This will disable the MON on oldhost3
and place a fresh one on
newhost
.
The MON daemons on oldhost1
and oldhost2
will not be touched at all and
continue running.
At this point, nothing much can go wrong in cluster operations. Any connected
clients will automatically go searching for another MON daemon and find either
oldhost1
or oldhost2
. But note: Those clients will not automagically get the
IP of newhost1
added to their potential MONs. Many parts of the cluster,
including the MON daemons on oldhost1
and oldhost2
, will be informed about
the new MON daemon.
But other parts of the cluster will not. Among the daemons which will not
automatically get the new MON address are the OSDs and NFS daemons.
At this point, I was not aware that there is any kind of problem.
I then adapted all of the ceph.conf
files and other places where the MON IPs
are mentioned. These were:
- Ceph CSI jobs running in my Nomad cluster
ceph.conf
files on a number of unmanaged physical hosts- The kernel command lines of my netbooting hosts, which contain the MONs
This was where I diverged from my original plan. Instead of just replacing the
IP of oldhost3
with the one of newhost1
, I went ahead and replaced all of them.
And here’s where the problems started. During reboots, my OSDs suddenly were no
longer recognized in the ceph -s
output. They were down, even though I could
see that they were up and running on their respective hosts.
The reason for this: The OSDs do not seem to be updated with new MON addresses
automatically, and they also ignore their host’s ceph.conf
file.
Instead, they have their own conf file, located at /var/lib/ceph/CLUSTER_ID/OSD_NAME/config
.
The CLUSTER_ID
here is the id:
line in the ceph -s
output and OSD_NAME
is
for example osd.1
. That file seems to be a ceph.conf
file used by the OSDs.
Just manually changing the MON addresses in there and restarting the daemons
fixed the issue.
I also observed that the NFS daemon I had running did not seem to be working anymore. It had the same problem and the same solution worked.
A final comment on performance: It seems that Raspberry Pis manage the load of MON daemons just fine. I’ve got three of them hosting the MONs now, and they are also running Nomad, Consul and Vault servers. The CPU utilization seldom goes above 10%.