Wherein I migrate my Nextcloud instance to the k8s cluster.
This is part 20 of my k8s migration series.
Nextcloud is the oldest continuously running service in my Homelab. It started out as an OwnCloud deployment back when I just called my Homelab my “Heimserver”. It ran continuously for more than ten years, and I quite like it.
Initially I only used it for file sharing between my devices and as a better alternative to a Samba share. Over the years, I also started using it for contacts and calendar sharing between my phone and desktop as well as sharing of my Firefox bookmarks between my laptop and desktop via Floccus. One perhaps somewhat surprising use case is for backups of OPNsense, which has support for backing up its configuration to Nextcloud out of the box.
My most recent use case was for notes sharing. When I’m researching something, say a new app I’d like to deploy in the Homelab, I like to plonk down in my armchair with my tablet. For a long time, I then had a problem with sharing notes between the tablet and my desktop. After some searching, I found Nextcloud’s Notes app. It isn’t the greatest note taking app, but it does the job adequately for what I need, allowing me to paste some links and write some comments on them while lounging in my armchair.
Nextcloud configuration
I’ve been using Nextcloud’s community-lead FPM image, which only contains Nextcloud itself, but no web server or anything else. For serving static assets and also just generally fronting Nextcloud, I’m using Caddy. For improved performance (or rather, reduced load), I’m also deploying the Rust-based push_notify app. It sends update notifications to connected clients, instead of needing the clients to poll the server for changes.
Finally, Nextcloud needs some regular cleanup tasks to be executed. And it being a PHP app without any scheduling capability, it needs the trigger for those regular tasks to come from outside the app itself. This can be configured in three ways:
- Running a task or two for every page load by a user
- Calling a dedicated URL regularly
- Setting up a cron job to call a dedicated PHP file
I’ve opted for option 2), because running a cron job in a container still doesn’t seem to be a solved problem, and I’ve found that using option 1) was not enough, because I don’t actually visit the web interface too often.
Then there’s also the question of data storage. A couple of years back, after I got my Ceph cluster up and running, I switched from a file-based backend to S3. This allowed me to stop worrying about partition sizes at least. But this, as all too many things in Nextcloud, has its quirks. Most importantly: Not all data gets stored in the S3 bucket. You still need to provide Nextcloud with a persistent volume, but at least its small: For my 10 - 15 years old instance, it’s only 29 MB worth of data. But still, it’s there.
Preparations
Preparing for the move, I had to set up three volumes.
The first one is the
webapp volume. This volume will be mounted into all of the containers of the
Pod, and it will contain Nextcloud’s /var/www/html
directory, where the
Nextcloud code lives.
This needs to be an RWX volume, because it needs to be accessed by the Nextcloud
FPM container, the Caddy container and the notify-push container. For this,
I created a 10 GB CephFS PersistentVolumeClaim, as that doesn’t have any issues
with concurrent access.
The second volume is for the data. As noted above, this one should not need too much space due to me using S3 for storage, so it’s only 1 GB. And finally there’s a scratch volume for Caddy, which also needs a bit of local storage. But that’s even smaller than the data volume, at only 500 MB.
Nextcloud also needs a database, which I’m running on CloudNativePG again. I’ve described how I’m migrating databases in detail here.
Nextcloud’s deployment
The Nextcloud Deployment manifest is pretty long, due to the number of containers I’m running in the Pod. Here it is in its entirety, I will describe the pieces in detail later:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nextcloud
spec:
replicas: 1
selector:
matchLabels:
homelab/app: nextcloud
strategy:
type: "Recreate"
template:
metadata:
labels:
homelab/app: nextcloud
annotations:
checksum/config-nc: {{ include (print $.Template.BasePath "/nextcloud-config.yaml") . | sha256sum }}
checksum/config-caddy: {{ include (print $.Template.BasePath "/caddy-config.yaml") . | sha256sum }}
spec:
automountServiceAccountToken: false
securityContext:
fsGroup: 33
runAsUser: 33
runAsGroup: 33
initContainers:
- name: nextcloud-init
image: alpine:latest
volumeMounts:
- name: webapp
mountPath: /data
- name: nextcloud-config
mountPath: /config
command: ["cp", "/config/config.php", "/data/config/config.php"]
containers:
- name: nextcloud
image: nextcloud:{{ .Values.appVersion }}
volumeMounts:
- name: data
mountPath: /homenet-data/data
subPath: data
- name: webapp
mountPath: /var/www/html
resources:
requests:
cpu: 400m
memory: 2048Mi
envFrom:
- secretRef:
name: nextcloud-bucket
optional: false
- secretRef:
name: nextcloud-secrets
optional: false
- configMapRef:
name: nextcloud-bucket
optional: false
env:
- name: HL_REDIS_HOST
value: "redis.redis.svc.cluster.local"
- name: HL_REDIS_PORT
value: "6379"
- name: HL_DB_NAME
valueFrom:
secretKeyRef:
name: nextcloud-pg-cluster-app
key: dbname
- name: HL_DB_HOST
valueFrom:
secretKeyRef:
name: nextcloud-pg-cluster-app
key: host
- name: HL_DB_PORT
valueFrom:
secretKeyRef:
name: nextcloud-pg-cluster-app
key: port
- name: HL_DB_USER
valueFrom:
secretKeyRef:
name: nextcloud-pg-cluster-app
key: user
- name: HL_DB_PW
valueFrom:
secretKeyRef:
name: nextcloud-pg-cluster-app
key: password
- name: nextcloud-push
image: nextcloud:{{ .Values.appVersion }}
command: ["/usr/bin/bash"]
args:
- "-c"
- "chmod u+x /var/www/html/custom_apps/notify_push/bin/$(uname -m)/notify_push; /var/www/html/custom_apps/notify_push/bin/$(uname -m)/notify_push"
volumeMounts:
- name: webapp
mountPath: /var/www/html
resources:
requests:
cpu: 200m
memory: 128Mi
env:
- name: NEXTCLOUD_URL
value: "https://nc.example.com"
- name: REDIS_URL
value: "redis://redis.redis.svc.cluster.local:6379"
- name: PORT
value: "{{ .Values.ports.notifyPush }}"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: nextcloud-pg-cluster-app
key: uri
- name: nextcloud-web
image: caddy:{{ .Values.caddyVersion }}
volumeMounts:
- name: webapp
mountPath: /my-apps/nextcloud
readOnly: true
- name: webscratch
mountPath: /data
- name: caddy-config
mountPath: /etc/caddy
resources:
requests:
cpu: 400m
memory: 128Mi
livenessProbe:
httpGet:
port: {{ .Values.ports.caddy }}
path: "/"
initialDelaySeconds: 15
periodSeconds: 30
ports:
- name: nextcloud-http
containerPort: {{ .Values.ports.caddy }}
protocol: TCP
- name: nextcloud-cron
image: nextcloud:{{ .Values.appVersion }}
command: ["/usr/bin/bash"]
args:
- "/cron-scripts/webcron.sh"
volumeMounts:
- name: cron-script
mountPath: /cron-scripts
resources:
requests:
cpu: 50m
memory: 50Mi
env:
- name: SLEEPTIME
value: "5m"
- name: INITIAL_WAIT
value: "10m"
volumes:
- name: data
persistentVolumeClaim:
claimName: nextcloud-data
- name: webapp
persistentVolumeClaim:
claimName: nextcloud-webapp
- name: webscratch
persistentVolumeClaim:
claimName: nextcloud-webscratch
- name: nextcloud-config
configMap:
name: nextcloud-config
- name: caddy-config
configMap:
name: caddy-config
- name: cron-script
configMap:
name: cron-script
The first thing to discuss is the Nextcloud configuration file, which is just
a PHP file, the config.php
. It can be split, but I’ve always had it all in a
single file and decided not to change that. In addition, while looking at the
README of the container GitHub repo, I
found that the image has some capability to do the entire configuration in
environment variables. That’s something to look at later.
The configuration file has one big quirk: It needs to be writable by Nextcloud,
at least during updates, because it contains the Nextcloud version. Which I find
an extremely weird thing to do. This leads to the manual step of updating the
ConfigMap containing the config.php
after an update is done.
Before I continue, I’d like to thank @rachel,
who was kind enough to provide me with her Nextcloud manifests and especially
her Nextcloud config file. The most important thing those taught me was the
use of the getenv
PHP function, so that I could provide all of the secrets
as environment variables, instead of having to construct an elaborate
external-secrets template.
As a consequence, my config.php
ConfigMap now looks like this:
apiVersion: v1
kind: ConfigMap
metadata:
name: nextcloud-config
data:
config.php: |
<?php
$CONFIG = array (
'apps_paths' =>
array (
0 =>
array (
'path' => '/var/www/html/apps',
'url' => '/apps',
'writable' => false,
),
1 =>
array (
'path' => '/var/www/html/custom_apps',
'url' => '/custom_apps',
'writable' => true,
),
),
'instanceid' => 'ID',
'datadirectory' => '/homenet-data/data',
'objectstore' => [
'class' => '\\OC\\Files\\ObjectStore\\S3',
'arguments' => [
'bucket' => getenv('BUCKET_NAME'),
'autocreate' => true,
'key' => getenv('AWS_ACCESS_KEY_ID'),
'secret' => getenv('AWS_SECRET_ACCESS_KEY'),
'hostname' => getenv('BUCKET_HOST'),
'port' => getenv('BUCKET_PORT'),
'use_ssl' => false,
'use_path_style'=>true
],
],
'trusted_domains' =>
array (
0 => 'nc.example.com',
1 => '127.0.0.1',
),
'trusted_proxies' =>
array (
0 => '127.0.0.1/32',
),
'memcache.local' => '\\OC\\Memcache\\Redis',
'redis' =>
array (
'host' => getenv('HL_REDIS_HOST'),
'port' => getenv('HL_REDIS_PORT'),
),
'memcache.locking' => '\\OC\\Memcache\\Redis',
'user_oidc' => [
'allow_multiple_user_backends' => 0,
'auto_provision' => false
],
'allow_local_remote_servers' => true,
'overwrite.cli.url' => 'https://nc.example.com',
'overwriteprotocol' => 'https',
'overwritewebroot' => '/',
'maintenance_window_start' => 100,
'default_phone_region' => 'DE',
'dbtype' => 'pgsql',
'version' => '30.0.6.2',
'dbname' => getenv('HL_DB_NAME'),
'dbhost' => getenv('HL_DB_HOST'),
'dbport' => getenv('HL_DB_PORT'),
'dbuser' => getenv('HL_DB_USER'),
'dbpassword' => getenv('HL_DB_PW'),
'dbtableprefix' => 'oc_',
'installed' => true,
'maintenance' => false,
'loglevel' => 2,
'logfile' => '/dev/stdout',
'log_type' => 'file',
'mail_domain' => 'example.com',
'mail_from_address' => 'nextcloud',
'mail_smtpmode' => 'smtp',
'mail_smtphost' => 'mail.example.com',
'mail_smtpport' => '465',
'mail_smtpsecure' => 'ssl',
'mail_smtpauth' => true,
'mail_smtpname' => 'nc@example.com',
'mail_smtppassword' => getenv('HL_MAIL_PW'),
'passwordsalt' => getenv('HL_PW_SALT'),
'secret' => getenv('HL_SECRET'),
);
One noteworthy piece here is the trusted_domains
setting, which contains
not only the actual domain Nextcloud is hosted on, but also 127.0.0.1
. This is
necessary because of the cron setup I will describe later.
I find this kind of configuration setup, where I can have a config file plus
environment variables for secrets quite convenient. It lets me have an actual
config file, but it also allows me to extract the secrets without having to work
with some sort of templating.
Another advantage of this setup, where I can define the names of config variables, is that I can use autogenerated Secrets directly, as you can see in the S3 setup:
'objectstore' => [
'class' => '\\OC\\Files\\ObjectStore\\S3',
'arguments' => [
'bucket' => getenv('BUCKET_NAME'),
'autocreate' => true,
'key' => getenv('AWS_ACCESS_KEY_ID'),
'secret' => getenv('AWS_SECRET_ACCESS_KEY'),
'hostname' => getenv('BUCKET_HOST'),
'port' => getenv('BUCKET_PORT'),
'use_ssl' => false,
'use_path_style'=>true
],
],
Here I was able to define the env variables in such a way that I could just
use the ConfigMap and Secret generated by Rook via envFrom
in the Deployment,
instead of having to define every variable individually.
But as I’ve noted above, Nextcloud needs write access to the config file, so just mounting the ConfigMap into the container is not an option, because ConfigMaps are always mounted read-only. So I had to reach for the typical init container trick used in these situations and copy the config map into the webapp volume:
initContainers:
- name: nextcloud-init
image: alpine:3.21.2
volumeMounts:
- name: webapp
mountPath: /data
- name: nextcloud-config
mountPath: /config
command: ["cp", "/config/config.php", "/data/config/config.php"]
Next comes the Nextcloud container itself. The main thing I’d like to point out
here is a gotcha that had me scratching my head for a little while. You can
see that I set two env variables for Redis, HL_REDIS_HOST
and HL_REDIS_PORT
.
When I first launched the Pod, those were called REDIS_HOST
and REDIS_PORT
,
which just so happen to be the same environment variables that the image uses.
It resulted in this error message:
Configuring Redis as session handler
/entrypoint.sh: 111: cannot create /usr/local/etc/php/conf.d/redis-session.ini: Permission denied
It made me pretty suspicious, because the ownership of the /usr
hierarchy
cannot have changed between Nomad and k8s, and the container was running with
the same UID/GID as it was in the Nomad cluster. So why was I suddenly seeing
this permission issue? I rummaged a bit through the Docker entrypoint
of the image and found that the error message was coming from this piece of
code:
if [ -n "${REDIS_HOST+x}" ]; then
echo "Configuring Redis as session handler"
{
file_env REDIS_HOST_PASSWORD
echo 'session.save_handler = redis'
# check if redis host is an unix socket path
if [ "$(echo "$REDIS_HOST" | cut -c1-1)" = "/" ]; then
if [ -n "${REDIS_HOST_PASSWORD+x}" ]; then
echo "session.save_path = \"unix://${REDIS_HOST}?auth=${REDIS_HOST_PASSWORD}\""
else
echo "session.save_path = \"unix://${REDIS_HOST}\""
fi
# check if redis password has been set
elif [ -n "${REDIS_HOST_PASSWORD+x}" ]; then
echo "session.save_path = \"tcp://${REDIS_HOST}:${REDIS_HOST_PORT:=6379}?auth=${REDIS_HOST_PASSWORD}\""
else
echo "session.save_path = \"tcp://${REDIS_HOST}:${REDIS_HOST_PORT:=6379}\""
fi
echo "redis.session.locking_enabled = 1"
echo "redis.session.lock_retries = -1"
# redis.session.lock_wait_time is specified in microseconds.
# Wait 10ms before retrying the lock rather than the default 2ms.
echo "redis.session.lock_wait_time = 10000"
} > /usr/local/etc/php/conf.d/redis-session.ini
fi
That sets up some Redis configurations, and I invariably ran into this because
I named my env variables the same as the image’s.
The error went away when I renamed the env variables to have the HL_
prefix,
so didn’t hit the if
above anymore.
Additionally noteworthy is the fact that the Nextcloud container doesn’t expose any port, only the Caddy web server does, which will proxy all requests targeting PHP files to the Nextcloud container.
That Caddy container looks like this:
- name: nextcloud-web
image: caddy:{{ .Values.caddyVersion }}
volumeMounts:
- name: webapp
mountPath: /my-apps/nextcloud
readOnly: true
- name: webscratch
mountPath: /data
- name: caddy-config
mountPath: /etc/caddy
resources:
requests:
cpu: 400m
memory: 128Mi
livenessProbe:
httpGet:
port: {{ .Values.ports.caddy }}
path: "/"
initialDelaySeconds: 15
periodSeconds: 30
ports:
- name: nextcloud-http
containerPort: {{ .Values.ports.caddy }}
protocol: TCP
It doesn’t need any of the Secrets and environment variables that the Nextcloud
container needs, and gets its configuration from a Caddyfile
:
apiVersion: v1
kind: ConfigMap
metadata:
name: caddy-config
data:
Caddyfile: |
{
admin off
auto_https off
log {
output stdout
level INFO
}
servers {
trusted_proxies static 127.0.0.1/32 300.300.300.1/32
}
}
:{{ .Values.ports.caddy }} {
root * /my-apps/nextcloud
file_server
log {
output stdout
format filter {
wrap json
fields {
request>headers>Authorization delete
request>headers>Cookie delete
}
}
}
route /push/* {
uri strip_prefix /push
reverse_proxy http://localhost:{{ .Values.ports.notifyPush }}
}
@provider-matcher {
path_regexp ^\/(?:updater|oc[ms]-provider)(?:$|\/)
}
rewrite @provider-matcher {path}/index.php
@php-matcher {
path_regexp ^\/(?:index|remote|public|cron|core\/ajax\/update|status|ocs\/v[12]|updater\/.+|oc[ms]-provider\/.+)\.php(?:$|\/)
}
php_fastcgi @php-matcher localhost:9000 {
root /var/www/html
}
redir /.well-known/carddav /remote.php/dav/ 301
redir /.well-known/caldav /remote.php/dav/ 301
redir /.well-known/webfinger /index.php{uri} 301
redir /.well-known/nodeinfo /index.php{uri} 301
@forbidden {
path /.htaccess
path /.user.ini
path /3rdparty/*
path /authors
path /build/*
path /config/*
path /console*
path /copying
path /data/*
path /db_structure
path /lib/*
path /occ
path /README
path /templates/*
path /tests/*
path /console.php
}
respond @forbidden 404
}
This config does a couple of things. First, it defines the webapp volume as the HTTP root and thus serves the content directly. This is so Caddy serves Nextcloud’s static assets. An important piece is the log config, which removes some secret data like cookies and auth headers from the request log. Then there’s a number of routes, the first one redirecting requests for the notify-push backend to that container’s port. Then there’s a rewrite of some “special” paths to PHP and the general PHP matcher, which forwards all PHP file requests to the Nextcloud container. And finally a couple of explicitly forbidden paths containing files that shouldn’t have external access.
Then there’s the nextcloud-push container:
- name: nextcloud-push
image: nextcloud:{{ .Values.appVersion }}
command: ["/usr/bin/bash"]
args:
- "-c"
- "chmod u+x /var/www/html/custom_apps/notify_push/bin/$(uname -m)/notify_push; /var/www/html/custom_apps/notify_push/bin/$(uname -m)/notify_push"
volumeMounts:
- name: webapp
mountPath: /var/www/html
resources:
requests:
cpu: 200m
memory: 128Mi
env:
- name: NEXTCLOUD_URL
value: "https://cloud.mei-home.net"
- name: REDIS_URL
value: "redis://redis.redis.svc.cluster.local:6379"
- name: PORT
value: "{{ .Values.ports.notifyPush }}"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: nextcloud-pg-cluster-app
key: uri
The notify-push component, which is separate from Nextcloud’s main codebase, is their attempt to solve some performance issues. Normally, clients have to proactively poll the server for changed files. This becomes inefficient fast even at a small, maximum three connected clients deployment like mine. In contrast to the majority of Nextcloud, this component was written in Rust for performance reasons. I’ve just checked and did not see much change in the CPU usage of my Nomad Nextcloud deployment after deploying notify-push for the firs time, but I still figure: Why not.
There were a couple of problems with this deployment though. The very first one was the fact that the Rust binaries are located in per-arch directories. In Nomad that wasn’t a problem, I could define the command’s path like this:
/var/www/html/custom_apps/notify_push/bin/${attr.kernel.arch}/notify_push
The ${attr.kernel.arch}
would be replaced with the CPU architecture of the
node the job got scheduled on by Nomad.
I was 100% sure that Kubernetes would have something similar. In fact, I knew
it did. The information is stored in the kubernetes.io/arch
label. And you can
get labels into env variables with the Downward API,
and then you can use env variables in the command
via the $(ENV_VAR)
syntax.
The problem: The arch
label is defined on nodes, not on Pods. And the Downward
API only allows access to Pod labels, not node labels. So I finally had to reach
for the uname -m
you see above. I was really surprised that k8s doesn’t have
the capability to inject the node’s arch into a container’s env.
But that wasn’t the end of my notify-push problems. Now that it was finally able to execute the binary, it error’d out with this error:
Error: php_literal_parser::unexpected_token
× Error while parsing nextcloud config.php
╰─▶ Error while parsing '/var/www/html/config/config.php':
No valid token found, expected one of boolean literal, integer literal,
float literal, string literal, 'null', 'array' or '['
╭─[22:31]
21 │ 'arguments' => [
22 │ 'bucket' => getenv('BUCKET_NAME'),
· ┬
· ╰── Expected boolean literal, integer literal, float literal, string literal, 'null', 'array' or '['
23 │ 'autocreate' => true,
╰────
Before I had the version using environment variables to provide the Nextcloud
configs needed by the notify-push app, I was providing the config.php
file
directly, which is supposed to work as well. I figured I had the file already
anyway, so why not use it?
But it looks like the PHP parser used by notify-push is not capable of
actually executing PHP, it expects the config options to all be set to a static
value.
That’s why I ended up using the environment variables supported by the notify-push
binary to set the necessary configuration options.
After all of that, the Pod finally fully started, and I was able to log in and got all of my files, calendars, contacts and so on. I also went through the warnings shown in the admin interface and had one issue I’d like to note here. The errors told me that my mail settings had not been tested, so I went into them and clicked the “send test mail” button. This showed an error immediately:
AxiosError: Request failed with status code 400
I had absolutely no idea what it meant, as I knew that my mail server was working as intended. It turned out that the issue wasn’t with the mail server or the Nextcloud mail config, but just the fact that I had never set a mail address for the admin account I was working in. 🤦
The last piece of the puzzle is the cron container. As I’ve described above,
Nextcloud needs some regularly executed tasks. I’m not enough of a web developer
to really have any experience with PHP, but from what I understand, PHP is request-oriented,
so it doesn’t have a convenient place to put/execute cron tasks?
Anyway, I needed some way to regularly call the cron.php
file to trigger these
regular maintenance tasks. The advise from the Nextcloud docs
recommend to hit the cron.php
file every five minutes. For that, I re-used the
Nextcloud container, because it already has all that’s needed onboard:
- name: nextcloud-cron
image: nextcloud:{{ .Values.appVersion }}
command: ["/usr/bin/bash"]
args:
- "/cron-scripts/webcron.sh"
volumeMounts:
- name: cron-script
mountPath: /cron-scripts
resources:
requests:
cpu: 50m
memory: 50Mi
env:
- name: SLEEPTIME
value: "5m"
- name: INITIAL_WAIT
value: "10m"
But instead of launching php-fpm, I run a simple bash script:
apiVersion: v1
kind: ConfigMap
metadata:
name: cron-script
labels:
{{- range $label, $value := .Values.commonLabels }}
{{ $label }}: {{ $value | quote }}
{{- end }}
data:
webcron.sh: |
#!/bin/bash
echo "$(date): Launched task, sleeping for ${INITIAL_WAIT}"
sleep "${INITIAL_WAIT}"
while true; do
curl http://127.0.0.1/cron.php 2>&1
echo ""
echo "$(date): Sleeping for ${SLEEPTIME}"
sleep "${SLEEPTIME}"
done
This does the task pretty nicely, while staying pretty simple.
Conclusion
This one went quite well. I was expecting more problems, especially considering that it sometimes looks like mine is the only Nextcloud deployment in the Homelabbing community which runs without any issues. 😅 I intentionally chose to not muck about with the setup too much and instead copied my Nomad setup as much as possible, which made for a relatively smooth migration. I was reluctant to change too much, because I rely on Nextcloud for a lot of my “I would rather not be without this for more than a weekend” needs. So being a bit conservative with how much I change was in order.
I haven’t decided what comes next yet - I might spend next week finishing some blog post drafts instead of starting anything new, because at this point I’ve mostly got “finish during the weekend because I need it during the week” stuff left in the migration.