Setting up Bookwyrm

Wherein I’m adding Bookwyrm to my Homelab.

I used to read novels. A lot. On school days, I would spend the approximately twenty minutes between the end of my morning routine and having to head off with a novel. Ditto for lazy Sunday evenings. During my service as a conscript, I would always find space for a book in my pack when we went on a training exercise. At University, the most difficult decision while packing for a trip home would be judging how many books I would need to pack to ensure I would not run out.

Getting my first Kindle in 2012 was a revolution. Suddenly, I didn’t need to think very hard anymore - I could take my entire library with me. 🎉

But for the last couple of years, my reading has slowly dwindled. So taking a break from my attempts to set up Tinkerbell, I decided to set up Bookwyrm, the Fediverse alternative to Goodreads.

Which, in hindsight, looks a bit weird: I want to read more novels. So first thing to do is more homelabbing. 😅

Bookwyrm

So, what does Bookwyrm look like? While I called it the Fediverse Goodreads alternative, I never actually used Goodreads. So I wasn’t sure exactly why I was getting myself into.

Here is what my home timeline looks like in Bookwyrm:

My Home Timeline

This represents Bookwyrm pretty nicely. The core function of it is socializing about books, so all interactions are relative to books. I believe there are private messages which can just be send to another user, but there is no generic, Mastodon-like microblogging. All actions are related to a book. In the above example, you can see two of my posts. The top one represents me marking The Three-Body Problem as a book I want to read. The post below it is a boost from my Mastodon account, where I mark False Gods as finished.

On the left of the screenshot is the new post interface, which reinforces what I wrote above: Bookwyrm is all about books. The new post interface is not just a text box I can write anything in, but it is instead made up of actions related to the selected book. For my English-speaking readers, the title roughly translates to “Fateful hour of a Democracy”, it’s a book about the history of the Weimar Republic. That short period in German history that should be a hell of a lot more emphasized in history lessons than what came before or after it, but sadly isn’t.

Back to Bookwyrm, I can write a review of the book, including a 0-5 star score, a general comment, or a Quote from the book. So all actions I can take relate to the book itself.

Each book also gets its own page, which looks like this:

A screenshot of Bookwyrm's book page for 'On Basilisk Station' by David Weber. Below the title is the name of the series, 'Honor Harrington', and the number 1, indicating that it's the first book. Both the series and the Author name are highlighted to indicate they are links. On the left side, it shows the book's cover. In this case, of a woman in a military uniform, with a spaceship firing a laser beam in the background. Below that is the rating, full five stars in this case. Then comes some general information about the book, including page count (422 pages), the language, the publishing date and the ISBN. On the right, the main part of the page starts with a description of the book. At the bottom of it is a link indicating nine more editions of the book being available. Then comes a section headed 'You have shelved this edition in', and it shows the 'Read' shelf. Then comes a 'Your reading activity' section, showing that I started reading this book on August 1st 2004 and finished on August 24th. Below that, the top of new post section I described in the previous section is visible. — An example of a book page

Scrolling further down shows the reviews for the book:

Another Bookwyrm screenshot, this time showing the bottom of the book page. There are multiple tabs, one for 'Reviews' and one for 'Your reviews'. Both just have a single entry, a review from me about the book and the Honor Harrington series overall. Below the review are buttons for boosting, replying and liking. — Bottom of the book page, with a review

What I find a bit sad is that it only shows the related reviews and posts, but the automatically created post about me starting to read the book is nowhere to be found.

Another problem is finding the “instance” of a book. Here is a screenshot of searching for “On Basilisk Station” in Bookwyrm:

A screenshot of Bookwyrm's book search results for 'On Basilisk Station'. It shows a variety of results from different Bookwyrm instances. All of them vary, in Author title, publication date, cover art, and full book title, some containing 'Honor Harrington', the series name, as part of the title. — Bookwyrm book search results

One of the good things here is that it got the right results, they’re all for the correct book. Something I haven’t shown here was that the initial result is only the book from my own instance, but the search can then be broadened to other sources. Besides Bookwyrm instances, the search also looks at other sites like Inventaire and OpenLibrary.

On better federated instances than mine, the book page for the same book looks a bit more lively:

The page for the same book as before, but now from books.theunseen.city.

This example comes from books.theunseen.city. So with more connections, the book page will fill up on my instance as well.

And that’s it for the Bookwyrm tour. I still haven’t dived deeply into it, and I’m currently following only one other person. But I already like it as a way for people to follow what I’m reading. Let’s see what the future holds.

Deploying Bookwyrm on Kubernetes

Let’s get on with the technical part. I of course wanted to deploy Bookwyrm in my Kubernetes cluster. But its default docs are geared towards deployment with docker-compose. And the instructions contain some “please run this script…” which I had to integrate into my setup, to ensure that I didn’t have to rely on documenting the commands somewhere.

But the first step had to be to create a container image, as the Bookwyrm project itself does not supply one.

Image creation

I took the container build instructions from the official Dockerfile and added the image to my CI. In the process, I completely remade my container image build setup, see this post if you’re interested.

The ultimate version of the image build looks like this:

ARG python_ver
FROM python:${python_ver}

ENV PYTHONUNBUFFERED 1

RUN mkdir /app /app/static /app/images

WORKDIR /app

RUN apt-get update && apt-get install -y gettext libgettextpo-dev tidy libsass-dev && apt-get clean

COPY . /app
RUN env SYSTEM_SASS="true" pip install -r requirements.txt --no-cache-dir

I made two important changes compared to the official Dockerfile. First, the official docker-compose deployment just mounts the Bookwyrm source code into the image to make it available. I wanted the image to be self-contained, so instead of only copying the requirements.txt file, I copied the entire source code into the /app directory.

Another change is the addition of libsass-dev to the installed packages, and adding the SYSTEM_PASS="true" variable to the pip invocation installing the dependencies. I found this to be required due to the arm64 image build. During the amd64 build, a full wheel is available for the libass package. But no wheel seems to be available for arm64, and so the C++ libsass is getting build as part of the pip invocation. This takes quite a while on a Pi4, especially as it looks like the compile is only using one core. The builds looked like this:

A screenshot of Woodpecker's pipeline overview. It shows a Bookwyrm image build, running for a total of 23 minutes. It has two build steps, one for amd64 and one for arm64. The amd64 image took 05:23 minutes, while the arm64 build of the same image took 17:30. — Image build for Bookwyrm without the system libsass.

The arm64 build took pretty much 3x as long as the amd64 build. Sure, some of it can be attributed to the arm64 builds being run on Raspberry Pi 4. But the main contributing factor was the fact that libsass needed to be rebuild for arm64, but not for amd64. After I started using the system libsass, This is what the build times look like:

Another screenshot of Woodpecker's pipeline overview. It again shows the Bookwyrm image build, but while the amd64 build still takes a comparable 5:40 minutes, the arm64 build now only takes 10:38 minutes. Still a lot longer, but no longer quite as bad. — Some improvements of the image build times after I started using the system libsass instead of letting pip build it.

Good enough for now.

But there was one issue remaining: As you can see, I’m copying the Bookwyrm code into the image. But I had to get that code from somewhere first, and I wanted to have it in my Homelab, instead of fetching it from GitHub every time. So I created a mirror on my Forgejo instance. That brought a new question: How to fetch that repo from Forgejo from within a Woodpecker job? I could certainly have made it a public repo and just fetched it, but I figured I would try to do it properly and fetch it with credentials.

But where to get the credentials from? I didn’t want to manually add them to the repo config in Woodpecker, because I figured that Woodpecker already had the credentials, because it had to fetch the container image repo where I put the Containerfile for the Bookwyrm image. Reading up a bit, I found the environment variable docs for Woodpecker. These contain the CI_NETRC_USERNAME and CI_NETRC_PASSWORD variables. These are set to the credentials needed to fetch from the git forge configured for the repository in Woodpecker. Note that the docs say this:

Credentials for private repos to be able to clone data. (Only available for specific images)

Sadly, it doesn’t say which images get a netrc file with the credentials mounted. I found more docs here, mentioning trusted clone plugins. I tried to build a small Alpine image with git installed, but still didn’t manage to get the credentials into that image. The error massage always read:

fatal: could not read Username for 'https://forgejo.example.com': No such device or address

I then dug through the code and tried to find the check, to see what was wrong with my new Alpine image, why it didn’t get the netrc credentials. I found this function:

func (c *Container) IsTrustedCloneImage(trustedClonePlugins []string) bool {
	return c.IsPlugin() && utils.MatchImageDynamic(c.Image, trustedClonePlugins...)
}

Note that it doesn’t just check the image, but also verifies that the step is a plugin, not just an image executing commands. Instead of building a plugin, I decided to try to work with the official clone plugin, which is also used to clone the initial repository for a Woodpecker pipeline run. This ultimately worked, and the step for fetching the Bookwyrm repo mirror from my Forgejo looks like this:

  - name: clone bookwyrm repo
    image: woodpeckerci/plugin-git
    settings:
      depth: 1
      tags: true
      branch: production
      partial: false
      remote: https://forgejo.example.com/mirrors/bookwyrm.git
      ref: 'v0.7.5'
      path: /woodpecker/bookwyrm

Note that the /mirrors/ part of the URL is not necessary to use it as a mirror, I just put my Forgejo mirrors into a group called mirrors.

And with this, I was ending up with the Bookwyrm repo, checked out to the tag v0.7.5 in /woodpecker/bookwyrm in the rest of the pipeline steps.

Getting to the point of having the Bookwyrm image was quite a ride, but now it’s time for the actual Kubernetes deployment.

Kubernetes deployment

When it comes to dependencies, Bookwyrm requires a Postgres DB and Redis, plus it supports an S3 bucket for media and other static assets. I will not go into detail on those dependencies. If you’re curious about how I’m setting them up in my Homelab, here are the two relevant posts:

When looking at Bookwyrm’s setup docs, it requires executing a script during initial deployment.

Initialize the database by running ./bw-dev migrate

And:

Initialize the application with ./bw-dev setup, and copy the admin code to use when you create your admin account.

So I needed to somehow integrate that into my setup. Looking at the bw-dev script, it became pretty clear that Bookwyrm is really geared towards a docker-compose deployment. The script is intended to be run outside of the Bookwyrm container, as indicated by the fact that it calls docker-compose to achieve things:

[...]
function runweb {
    $DOCKER_COMPOSE run --rm web "$@"
}
[...]
function initdb {
    runweb python manage.py initdb "$@"
}

function migrate {
    runweb python manage.py migrate "$@"
}

function admin_code {
    runweb python manage.py admin_code
}

This of course won’t work in a Kubernetes deployment. To work around this, I wrote my own script, using the manage.py commands directly, without calling the bw-dev script. It ended up looking like this:

apiVersion: v1
kind: ConfigMap
metadata:
  name: bookwyrm-script
  labels:
    {{- range $label, $value := .Values.commonLabels }}
    {{ $label }}: {{ $value | quote }}
    {{- end }}
data:
  bookwyrm.sh: |
    #! /bin/bash

    migrate() {
      python manage.py migrate "$@" || return 1
    }

    initdb() {
      python manage.py initdb "$@" || return 1
    }

    init() {
      echo "Running init function..."
      migrate || return 1
      migrate "django_celery_beat" || return 1
      initdb || return 1
      python manage.py compile_themes || return 1
      python manage.py collectstatic --no-input || return 1
      python manage.py admin_code || return 1
      return 0
    }

    update() {
      echo "Running update function..."
      migrate || return 1
      python manage.py compile_themes || return 1
      python manage.py collectstatic --no-input || return 1

      return 0
    }

    op="${1}"
    if [[ "${op}" == "init" ]]; then
      init || exit 1
    elif [[ "${op}" == "update" ]]; then
      update || exit 1
    else
      echo "Unknown operation ${op}, aborting."
      exit 1
    fi

    exit 0

This script supports two functions, the first deployment initialization when running bookwyrm.sh init, and the possible migration required during updates, with bookwyrm.sh update.

Next question, how to run the script? For that, I looked into Helm chart hooks. These are annotations put into a template in a Helm chart which instantiates the template only under certain circumstances. There are hooks available for all phases of the Helm chart lifecycle, from install over delete to updates.

I sadly couldn’t make use of the post-install hook for the init part of the Bookwyrm script, because I had already installed the chart, as it also contains the CloudNativePG and S3 bucket templates. and I already installed that part of the chart. So for the init step, I opted for a simple workaround. The Job’s manifest looks like this:

{{- if .Values.runInit }}
apiVersion: batch/v1
kind: Job
metadata:
  name: bookwyrm-init
  labels:
    {{- range $label, $value := .Values.commonLabels }}
    {{ $label }}: {{ $value | quote }}
    {{- end }}
spec:
  template:
    metadata:
      name: bookwyrm-init
      labels:
        {{- range $label, $value := .Values.commonLabels }}
        {{ $label }}: {{ $value | quote }}
        {{- end }}
    spec:
      restartPolicy: Never
      containers:
        - name: init-script
          image: harbor.example.com/homelab/bookwyrm:{{ .Values.appVersion }}
          command: ["bash"]
          args:
            - /hl/bookwyrm.sh
            - init
          volumeMounts:
            - name: bookwyrm-script
              mountPath: /hl
              readOnly: true
          {{- with .Values.env }}
          env:
            {{- toYaml . | nindent 11 }}
          {{- end }}
      volumes:
        - name: bookwyrm-script
          configMap:
            name: bookwyrm-script
{{- end }}

So it only gets created when the value runInit is true in the values.yaml file.

But for the update Job, which does DB migrations and regenerates static assets, I was able to use the pre-upgrade hook. The manifest looks like this:

apiVersion: batch/v1
kind: Job
metadata:
  name: bookwyrm-update
  labels:
    {{- range $label, $value := .Values.commonLabels }}
    {{ $label }}: {{ $value | quote }}
    {{- end }}
  annotations:
    "helm.sh/hook": pre-upgrade
spec:
  template:
    metadata:
      name: bookwyrm-update
      labels:
        {{- range $label, $value := .Values.commonLabels }}
        {{ $label }}: {{ $value | quote }}
        {{- end }}
    spec:
      restartPolicy: Never
      containers:
        - name: update-script
          image: harbor.example.com/homelab/bookwyrm:{{ .Values.appVersion }}
          command: ["bash"]
          args:
            - /hl/bookwyrm.sh
            - update
          volumeMounts:
            - name: bookwyrm-script
              mountPath: /hl
              readOnly: true
          {{- with .Values.env }}
          env:
            {{- toYaml . | nindent 11 }}
          {{- end }}
      volumes:
        - name: bookwyrm-script
          configMap:
            name: bookwyrm-script

Note especially this part:

metadata:
  annotations:
    "helm.sh/hook": pre-upgrade

That is what marks the Job as a hook to be run before anything else is updated.

The upgrade hook has one unfortunate semantic though - it will be launched whenever the Helm chart is updated. Not just when the Bookwyrm version is incremented. What that means is that any time there is any change to the chart, even if it is just an added label for example, the Job will be executed. And it will be executed during the helm upgrade run, and before anything else. So you run helm upgrade, and Helm won’t return immediately. It will wait for the hook to finish running, and only then update all of the other manifests, where necessary. So these Helm runs will take a bit longer. But that still seems to be a relatively small prize compared to having the instructions written in a documentation page I need to remember to execute when Bookwyrm is updated.

Here is some of the output of my run of the Bookwyrm initialization:

Running init function...
Operations to perform:
  Apply all migrations: admin, auth, bookwyrm, contenttypes, django_celery_beat, oauth2_provider, sessions
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying auth.0001_initial... OK
  Applying auth.0002_alter_permission_name_max_length... OK
  Applying auth.0003_alter_user_email_max_length... OK
  Applying auth.0004_alter_user_username_opts... OK
  [...]
Operations to perform:
  Apply all migrations: django_celery_beat
Running migrations:
  No migrations to apply.
  Your models in app(s): 'bookwyrm' have changes that are not yet reflected in a migration, and so won't be applied.
  Run 'manage.py makemigrations' to make new migrations, and then re-run 'manage.py migrate' to apply them.
Compiled SASS/SCSS file: '/app/bookwyrm/static/css/themes/bookwyrm-dark.scss'
Compiled SASS/SCSS file: '/app/bookwyrm/static/css/themes/bookwyrm-light.scss'
257 static files copied.
*******************************************
Use this code to create your admin account:
1234-56-78-910-111213
*******************************************

Especially the last part is important, as that code is needed to create the initial admin account.

With that done, I was finally ready to write the Deployment. For that, I took the official docker-compose file as a blueprint:

services:
  nginx:
    image: nginx:1.25.2
    restart: unless-stopped
    ports:
      - "1333:80"
    depends_on:
      - web
    networks:
      - main
    volumes:
      - ./nginx:/etc/nginx/conf.d
      - static_volume:/app/static
      - media_volume:/app/images
  db:
    image: postgres:13
    env_file: .env
    volumes:
      - pgdata:/var/lib/postgresql/data
    networks:
      - main
  web:
    build: .
    env_file: .env
    command: python manage.py runserver 0.0.0.0:8000
    volumes:
      - .:/app
      - static_volume:/app/static
      - media_volume:/app/images
      - exports_volume:/app/exports
    depends_on:
      - db
      - celery_worker
      - redis_activity
    networks:
      - main
    ports:
      - "8000"
  redis_activity:
    image: redis:7.2.1
    command: redis-server --requirepass ${REDIS_ACTIVITY_PASSWORD} --appendonly yes --port ${REDIS_ACTIVITY_PORT}
    volumes:
      - ./redis.conf:/etc/redis/redis.conf
      - redis_activity_data:/data
    env_file: .env
    networks:
      - main
    restart: on-failure
  redis_broker:
    image: redis:7.2.1
    command: redis-server --requirepass ${REDIS_BROKER_PASSWORD} --appendonly yes --port ${REDIS_BROKER_PORT}
    volumes:
      - ./redis.conf:/etc/redis/redis.conf
      - redis_broker_data:/data
    env_file: .env
    networks:
      - main
    restart: on-failure
  celery_worker:
    env_file: .env
    build: .
    networks:
      - main
    command: celery -A celerywyrm worker -l info -Q high_priority,medium_priority,low_priority,streams,images,suggested_users,email,connectors,lists,inbox,imports,import_triggered,broadcast,misc
    volumes:
      - .:/app
      - static_volume:/app/static
      - media_volume:/app/images
      - exports_volume:/app/exports
    depends_on:
      - db
      - redis_broker
    restart: on-failure
  celery_beat:
    env_file: .env
    build: .
    networks:
      - main
    command: celery -A celerywyrm beat -l INFO --scheduler django_celery_beat.schedulers:DatabaseScheduler
    volumes:
      - .:/app
      - static_volume:/app/static
      - media_volume:/app/images
      - exports_volume:/app/exports
    depends_on:
      - celery_worker
    restart: on-failure
  flower:
    build: .
    command: celery -A celerywyrm flower --basic_auth=${FLOWER_USER}:${FLOWER_PASSWORD} --url_prefix=flower
    env_file: .env
    volumes:
      - .:/app
      - static_volume:/app/static
    networks:
      - main
    depends_on:
      - db
      - redis_broker
    restart: on-failure
  dev-tools:
    build: dev-tools
    env_file: .env
    volumes:
      - /app/dev-tools/
      - .:/app
    profiles:
      - tools
volumes:
  pgdata:
  static_volume:
  media_volume:
  exports_volume:
  redis_broker_data:
  redis_activity_data:
networks:
  main:

It’s a pretty long one, so let’s go through one-by-one. I skipped the Nginx deployment entirely, as I’m using Bookwyrm’s S3 support for static assets and images, and with that, the Nginx deployment doesn’t seem to be necessary. For the same reason, I also don’t have any volumes for /app/static and /app/images. I initially had volumes there, as the docs were not 100% clear whether the directories might still be used even with S3, but after a couple of days of running Bookwyrm, I found them to still be empty and removed the volumes. I also ignored the dev-tools service, as that also seemed to be unnecessary. I also skipped the redis_activity and redis_broker services as well as the db service, as I already created those by using CloudNativePG and my existing Redis instance.

That left me with the following services to run:

services:
  web:
    build: .
    env_file: .env
    command: python manage.py runserver 0.0.0.0:8000
    volumes:
      - .:/app
      - static_volume:/app/static
      - media_volume:/app/images
      - exports_volume:/app/exports
    depends_on:
      - db
      - celery_worker
      - redis_activity
    networks:
      - main
    ports:
      - "8000"
  celery_worker:
    env_file: .env
    build: .
    networks:
      - main
    command: celery -A celerywyrm worker -l info -Q high_priority,medium_priority,low_priority,streams,images,suggested_users,email,connectors,lists,inbox,imports,import_triggered,broadcast,misc
    volumes:
      - .:/app
      - static_volume:/app/static
      - media_volume:/app/images
      - exports_volume:/app/exports
    depends_on:
      - db
      - redis_broker
    restart: on-failure
  celery_beat:
    env_file: .env
    build: .
    networks:
      - main
    command: celery -A celerywyrm beat -l INFO --scheduler django_celery_beat.schedulers:DatabaseScheduler
    volumes:
      - .:/app
      - static_volume:/app/static
      - media_volume:/app/images
      - exports_volume:/app/exports
    depends_on:
      - celery_worker
    restart: on-failure
  flower:
    build: .
    command: celery -A celerywyrm flower --basic_auth=${FLOWER_USER}:${FLOWER_PASSWORD} --url_prefix=flower
    env_file: .env
    volumes:
      - .:/app
      - static_volume:/app/static
    networks:
      - main
    depends_on:
      - db
      - redis_broker
    restart: on-failure
networks:
  main:

One thing to note is that they all use the same .env file, and Bookwyrm’s stack is mostly configured via environment variables, which I applaud. So to not have to copy the env for each container, I added this section to my values.yaml file:

env:
  - name: POD_IP
    valueFrom:
      fieldRef:
        fieldPath: status.podIP
  - name: DEBUG
    value: "false"
  - name: ALLOWED_HOSTS
    value: "bookwyrm.example.com,localhost,$(POD_IP)"
  - name: SECRET_KEY
    valueFrom:
      secretKeyRef:
        name: secret-key
        key: key
  - name: DOMAIN
    value: "bookwyrm.example.com"
  - name: USE_HTTPS
    value: "true"
  - name: PGPORT
    valueFrom:
      secretKeyRef:
        name: bookwyrm-pg-cluster-app
        key: port
  - name: POSTGRES_PASSWORD
    valueFrom:
      secretKeyRef:
        name: bookwyrm-pg-cluster-app
        key: password
  - name: POSTGRES_USER
    valueFrom:
      secretKeyRef:
        name: bookwyrm-pg-cluster-app
        key: user
  - name: POSTGRES_DB
    valueFrom:
      secretKeyRef:
        name: bookwyrm-pg-cluster-app
        key: dbname
  - name: POSTGRES_HOST
    valueFrom:
      secretKeyRef:
        name: bookwyrm-pg-cluster-app
        key: host
  - name: REDIS_ACTIVITY_URL
    value: "redis://redis.redis.svc.cluster.local:6379/0"
  - name: REDIS_BROKER_URL
    value: "redis://redis.redis.svc.cluster.local:6379/1"
  - name: FLOWER_USER
    valueFrom:
      secretKeyRef:
        name: flower
        key: user
  - name: FLOWER_PASSWORD
    valueFrom:
      secretKeyRef:
        name: flower
        key: pw
  - name: FLOWER_BASIC_AUTH
    value: "$(FLOWER_USER):$(FLOWER_PASSWORD)"
  - name: FLOWER_PORT
    value: "8888"
  - name: EMAIL_HOST
    value: "mail.example.com"
  - name: EMAIL_PORT
    value: "465"
  - name: EMAIL_HOST_USER
    value: "bookwyrm@example.com"
  - name: EMAIL_HOST_PASSWORD
    valueFrom:
      secretKeyRef:
        name: mail-pw
        key: pw
  - name: EMAIL_SENDER_NAME
    value: "bookwyrm"
  - name: EMAIL_SENDER_DOMAIN
    value: "example.com"
  - name: USE_S3
    value: "true"
  - name: AWS_ACCESS_KEY_ID
    valueFrom:
      secretKeyRef:
        name: bookwyrm-bucket
        key: AWS_ACCESS_KEY_ID
  - name: AWS_SECRET_ACCESS_KEY
    valueFrom:
      secretKeyRef:
        name: bookwyrm-bucket
        key: AWS_SECRET_ACCESS_KEY
  - name: AWS_STORAGE_BUCKET_NAME
    valueFrom:
      configMapKeyRef:
        name: bookwyrm-bucket
        key: BUCKET_NAME
  - name: AWS_S3_CUSTOM_DOMAIN
    value: "s3-bookwyrm.example.com"
  - name: AWS_S3_ENDPOINT_URL
    value: "http://rook-ceph-rgw-rgw-bulk.rook-cluster.svc"
  - name: ENABLE_THUMBNAIL_GENERATION
    value: "true"

I won’t go through all of the options, but there are a few I would like to highlight. First, the POD_IP setting is important for Kubernetes probes to work. They will by default access the pod via its IP, and that IP needs to be specifically allowed for Django apps. I’ve had a similar issue with Paperless-ngx before, which is also a Django app.

Another one is the flower auth:

  - name: FLOWER_USER
    valueFrom:
      secretKeyRef:
        name: flower
        key: user
  - name: FLOWER_PASSWORD
    valueFrom:
      secretKeyRef:
        name: flower
        key: pw
  - name: FLOWER_BASIC_AUTH
    value: "$(FLOWER_USER):$(FLOWER_PASSWORD)"

In the docker-compose example from Bookwyrm, the credentials are provided on the command line:

  flower:
    build: .
    command: celery -A celerywyrm flower --basic_auth=${FLOWER_USER}:${FLOWER_PASSWORD} --url_prefix=flower
    env_file: .env
    volumes:
      - .:/app
      - static_volume:/app/static
    networks:
      - main
    depends_on:
      - db
      - redis_broker
    restart: on-failure

I was never really able to get this working - for reasons I’m unsure about but probably have something to do with string escaping, I was not able to login with my credentials. So I moved them to the FLOWER_BASIC_AUTH environment variable, at which point they immediately started working.

With all of that out of the way, here is the Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bookwyrm
  labels:
    {{- range $label, $value := .Values.commonLabels }}
    {{ $label }}: {{ $value | quote }}
    {{- end }}
spec:
  replicas: 1
  selector:
    matchLabels:
      homelab/app: bookwyrm
      {{- range $label, $value := .Values.commonLabels }}
      {{ $label }}: {{ $value | quote }}
      {{- end }}
  strategy:
    type: "Recreate"
  template:
    metadata:
      labels:
        homelab/app: bookwyrm
        {{- range $label, $value := .Values.commonLabels }}
        {{ $label }}: {{ $value | quote }}
        {{- end }}
    spec:
      automountServiceAccountToken: false
      securityContext:
        fsGroup: 1000
      containers:
        - name: bookwyrm-web
          image: harbor.example.com/homelab/bookwyrm:{{ .Values.appVersion }}
          command: ["python"]
          args:
            - "manage.py"
            - "runserver"
            - "0.0.0.0:{{ .Values.ports.web }}"
          resources:
            requests:
              cpu: 200m
              memory: 500Mi
          {{- with .Values.env }}
          env:
            {{- toYaml . | nindent 11 }}
          {{- end }}
          livenessProbe:
            httpGet:
              port: {{ .Values.ports.web }}
              path: "/"
            initialDelaySeconds: 15
            periodSeconds: 30
          ports:
            - name: bookwyrm-http
              containerPort: {{ .Values.ports.web }}
              protocol: TCP
        - name: bookwyrm-celery-worker
          image: harbor.example.com/homelab/bookwyrm:{{ .Values.appVersion }}
          command: ["celery"]
          args:
            - "-A"
            - "celerywyrm"
            - "worker"
            - "-l"
            - "info"
            - "-Q"
            - "high_priority,medium_priority,low_priority,streams,images,suggested_users,email,connectors,lists,inbox,imports,import_triggered,broadcast,misc"
          resources:
            requests:
              cpu: 200m
              memory: 200Mi
          {{- with .Values.env }}
          env:
            {{- toYaml . | nindent 11 }}
          {{- end }}
        - name: bookwyrm-celery-beat
          image: harbor.example.com/homelab/bookwyrm:{{ .Values.appVersion }}
          command: ["celery"]
          args:
            - "-A"
            - "celerywyrm"
            - "beat"
            - "-l"
            - "INFO"
            - "--scheduler"
            - "django_celery_beat.schedulers:DatabaseScheduler"
          resources:
            requests:
              cpu: 200m
              memory: 200Mi
          {{- with .Values.env }}
          env:
            {{- toYaml . | nindent 11 }}
          {{- end }}
        - name: bookwyrm-flower
          image: harbor.example.com/homelab/bookwyrm:{{ .Values.appVersion }}
          command: ["celery"]
          args:
            - "-A"
            - "celerywyrm"
            - "flower"
            - "--url_prefix=flower"
          resources:
            requests:
              cpu: 200m
              memory: 200Mi
          {{- with .Values.env }}
          env:
            {{- toYaml . | nindent 11 }}
          {{- end }}
          ports:
            - name: flower-http
              containerPort: {{ .Values.ports.flower }}
              protocol: TCP

Only one comment to the above: Take the resource requests with a grain of salt, I haven’t gotten around to looking at the metrics for the first week of deployment. The above values are still the semi-random values I drew out of a hat while writing the manifest.

At this point, I thought I was done. But that would have been too easy.

The power of CSS

The reason I was sure I wasn’t done yet is that the home page of Bookwyrm looked like this when I first opened it:

A screenshot of the homepage of my Bookwyrm instance before logging in. It is a bit...minimal, shall we say. The only styling visible is the font size of headings and the fact that those are written in bold, and the fact that links have the typical link coloring. Everything, including text boxes for username/password entry, is completely unstyled. And everything is squished on the left side of the page. — There’s clearly something wrong.

Obviously, that’s not what it’s supposed to look like. Those of you who are a bit more familiar with webdev than I am will likely immediately see that there’s some problem with the CSS, but to me it was not quite that clear. A look into the browser console with messages about the file not being found lead me to the same conclusion. I saw the following when opening the sources of the page:

<link href="https://s3-bookwyrm.mei-home.net/css/themes/bookwyrm-light.css" rel="stylesheet" type="text/css" />

But when looking at the S3 bucket, I saw that the file was at /static/.... Searching a bit, I found this bug. It was already fixed in the newest release, v0.7.5, but I had started out with v0.7.4, as I wanted to have a chance to test my upgrade hook/script right away.

After updating to v0.7.5, I at least got some proper styling, but it still looked like some things were missing:

A screenshot of the homepage of my Bookwyrm instance. This time, there's definitely some styling present. But notably, some font issues are visible, with only the glyphs with the Unicode numbers showing, not the actual symbols. — Finally styled, but still with some font glyhs clearly missing.

Note especially the missing glyphs for the symbols above “Dezentral”, “Freundlich” and “Nichtkommerziell”. And please forgive the partial German language, I hadn’t realized the language mix when taking the screenshot.

Looking at the browser console again, I saw this error message. Checking a bit further, I found that I missed a part of Bookwyrm’s S3 setup docs. I followed these docs from Hetzner to apply the necessary CORS configs to my S3 bucket. I couldn’t directly apply the JSON config provided in the Bookwyrm docs, because s3cmd, my default S3 tool, doesn’t support JSON for the CORS config, only XML. So I translated it to this:

<CORSConfiguration>
  <CORSRule>
    <AllowedHeader>*</AllowedHeader>
    <AllowedMethod>GET</AllowedMethod>
    <AllowedMethod>HEAD</AllowedMethod>
    <AllowedMethod>POST</AllowedMethod>
    <AllowedMethod>PUT</AllowedMethod>
    <AllowedMethod>DELETE</AllowedMethod>
    <MaxAgeSeconds>3000</MaxAgeSeconds>
    <ExposeHeader>Etag</ExposeHeader>
    <AllowedOrigin>https://bookwyrm.example.com</AllowedOrigin>
  </CORSRule>
</CORSConfiguration>

I stored the above XML config into a cors.xml file and applied it to my Bookwyrm bucket with this command:

s3cmd -c s3-conf setcors cors.xml s3://bookwyrm/

Here, s3-conf is the s3cmd config for my Ceph S3 setup.

And after that, I was finally done: Bookwyrm looked like it was supposed to! 🎉

Initial network sync?

After I had finally set up my instance, I started to enter a few books, mostly for testing purposes. Which was when I realized that I could hear a lot of disk activity. And looking at my metrics, I found that the Bookwyrm container was using a lot of CPU:

Screenshot of a Grafana time series graph. It shows the entirety of August 24th, from 00:00 to 23:59. For most of this time, the graph, which is from the bookwyrm-celery-worker container, shows more or less a flat line around 0.01, with only very occasional spikes to 0.6 at max. Then came 16:21, and the CPU utilization suddenly went up to peaks of 1.7 and did not get lower than 0.6 anymore, mostly oscillating around 1.4. This went on until about 21:45, when the line went back to 0.01. — CPU utilization of the bookwyrm-celery-worker container.

Looking around a bit more, I also found that there were a lot of new objects created in my S3 pool on Ceph:

Another Grafana time series screenshot. This time, it shows the object creation and deletion in the Ceph pool used for data storage for my S3 setup. It again shows the entire day, from 00:00 to 23:59. It again mostly stays around 0, meaning no objects are created or deleted. But there is a very regular spike of 12 new objects being created every five minutes. Besides that, there are a couple of spikes, both for lots of added and lots of removed objects. The main event again happens starting around 16:21, with the creations suddenly increasing to about 600 objects. This goes on, like the celery cPU usage from the previous graph, to about 21:45, when it returns to the previous levels. — Object changes in the S3 data pool, negative values are removed objects, positive values are numbers of added objects.

So it seemed that something was going on with Bookwyrm there, but I had no idea what it might be. Checking the S3 bucket, I saw a lot more book covers appearing in there. But I hadn’t even done much at that point, just added a handful of books. At that point I was flailing a little bit for what it might be. Then I had the idea of looking at flower, which the Bookwyrm docs advertise as a way to look at ongoing tasks.

This was the picture presented to me at the time:

A screenshot of flower's task list. It shows a lot of them, an entire screen of 15 tasks, all started just between 19:39:15 and 19:39:26. The shown task names only have two variations, 'base_activity.set_related_field' and 'add_status_task'. The args are also shown, and all seem to be the addition of 'Works', which I think is a book in Bookwyrm's object model. — List of tasks in Flower

Noteworthy is that most of the tasks are related to Work objects, which, if I’m not mistaken, are books in Bookwyrm’s object model. So there seem to be a lot of things being done with a lot of books. And I had only added two or three books myself at that point, and hadn’t followed a single person yet. Also note that the tasks all started in the same minute, 19:39. And it went on and on like this.

Then I saw that there’s a link to my instance in the args column, and I clicked one of the tasks to get to this details page:

A screenshot of flower's task details for one of the 'base_activity.set_related_field' tasks. The important part here is the full content of the args value: 'Edition,Work,parent_work,https://bookwyrm.mei-home.net/book/16858,https://bookwyrm.social/book/151006'. — Example of task details.

I then checked which book the shown https://bookwyrm.mei-home.net/book/16858 URL from the args value points to:

A screenshot of the Bookwyrm book page for Stephen King's The Dark Tower. — This was the book the flower task related to

The thing is: I hadn’t interacted with that book, at all. So I tried a few more books from other flower tasks, and they were the same - books I had not interacted with. So the only conclusion I can draw for now is that Bookwyrm looks at all known instances and downloads their entire database of books and adds them to my instance?

If you actually know what’s going on here, please contact me at my Mastodon account and tell me. I’m genuinely curious.

Final thoughts

I’m really curious what that initial database sync (?) was for.

The Bookwyrm setup also holds one last challenge: Resisting the temptation of entering all the books I’ve read in the last 32 years. 😅

Last but not least, if you’d like to follow my reading, I’m https://bookwyrm.mei-home.net/user/mmeier.

Bookwyrm#

Deploying Bookwyrm on Kubernetes#

Image creation#

Kubernetes deployment#

The power of CSS#

Initial network sync?#

Final thoughts#