In the last post of my Backup Operator series, I lamented the state of permissions in the kopf Kubernetes Operator framework. After some thinking, I decided to go ahead with kopf and just accept the permission/RBAC ugliness.
I’ve just finished implementing the first cluster state change in the operator, so I thought this is a good place to write a post about my approach and setup.
The journey up to now has been pretty interesting. I learned a bit about the Kubernetes API, and a lot about how cooperative multitasking with coroutines works in Python.
Why write an entire operator?
I’ve already written some things about my backup setup in the Kubernetes migration post which triggered this operator implementation.
Just to give a short refresher: I need to run daily backups on the persistent volumes and S3 buckets of the services running in my Homelab. I’m currently doing that by launching a run-to-completion job on every one of my Nomad hosts which backs up all the volumes which happen to be mounted on their host at the time. I can’t do that in k8s, because it seems to lack a run-to-completion, run-on-every-host type of workload. Jobs can do the run-to-completion part, and DaemonSets can do the run-on-every-host part, but there doesn’t seem to be a workload type which can do both in one. And that’s why I’ve decided to go with writing my own operator. There are two main benefits this approach will have, compared to my previous one. First, I will be able to explicitly schedule the second stage of my backup, backing up certain backups onto an external disk. Right now, I just schedule that phase an hour after the previous one. Second, I will be able to package the backup config for each individual service. In my current approach, I have the definition of which volumes and buckets to back up configured in the backup job’s config. With the Kubernetes operator, I will introduce a CRD that can be deployed together with each service, e.g. as part of the Helm chart.
Overview of the approach
As I’ve mentioned above, I will write the operator in Python and use the kopf framework to do it. This is simply because I’m currently familiar with three languages: C++, C and Python. And Python is the most comfortable of the three. Due to the RBAC problems I described in my last post, I briefly looked into other possibilities. But the Kubernetes ecosystem seems to mostly live in Golang, which I haven’t written anything in yet. And the main goal currently is to get ahead with the Homelab migration to k8s, not to learn yet another programming language. 🙂
There will be a total of three custom resources the operator will look for. The first one, HomelabBackupConfig, will be a one-per-cluster resource and looks like this:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: homelabbackupconfigs.mei-home.net
spec:
scope: Namespaced
group: mei-home.net
names:
kind: HomelabBackupConfig
plural: homelabbackupconfigs
singular: homelabbackupconfig
versions:
- name: v1alpha1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
description: "This object describes the general configuration of all backups created by the Homelab backup operator."
properties:
spec:
type: object
properties:
serviceBackup:
type: object
description: "The configuration for all service level backups created by the operator instance."
properties:
schedule:
type: string
description: "The schedule on which all service level backups will be executed."
scratchVol:
type: string
description: "The name of the PVC for scratch space. Needs to be a RWX volume."
s3BackupConfig:
type: object
description: "Configuration for S3 access to the backup buckets."
properties:
s3Host:
type: string
description: "The S3 server hosting the backup buckets."
s3Credentials:
type: object
description: "The S3 credentials for the backup S3 user."
properties:
secretName:
type: string
description: "The name of the Secret containing the credentials."
accessKeyIDProperty:
type: string
description: "The name of the property in the secretName secret with the AWS_ACCESS_KEY_ID"
secretKeyProperty:
type: string
description: "The name of the property in the secretName secret with the AWS_SECRET_ACCESS_KEY"
s3ServiceConfig:
type: object
description: "Configuration for S3 access to the service buckets which should be backed up."
properties:
s3Host:
type: string
description: "The S3 server hosting the buckets which should be backed up."
s3Credentials:
type: object
description: "The S3 credentials for the service S3 user."
properties:
secretName:
type: string
description: "The name of the Secret containing the credentials."
accessKeyIDProperty:
type: string
description: "The name of the property in the secretName secret with the AWS_ACCESS_KEY_ID"
secretKeyProperty:
type: string
description: "The name of the property in the secretName secret with the AWS_SECRET_ACCESS_KEY"
resticPasswordSecret:
type: object
description: "The Secret with the Restic password for the backups."
properties:
secretName:
type: string
description: "The name of the Secret containing the password."
secretKey:
type: string
description: "The name of the property in the secretName Secret which contains the Restic password."
jobSpec:
type: object
description: "Configuration of the Job launched for each service backup."
properties:
image:
type: string
description: "The container image to be used for all service Jobs."
command:
type: array
description: "The command handed to Job.spec.template.containers.command"
items:
type: string
env:
type: array
description: "Additional entries for the containers.env list. These entries cann only be of the name,value variety. Other forms of env entries are not supported for now."
items:
type: object
properties:
name:
type: string
description: "The name of the env variable to add."
value:
type: string
description: "The value of the env variable to add."
This resource configures all of the common settings which will be shared by all of the individual service backups I will describe next.
My backups will be running with restic, backing up into
S3 buckets on my Ceph Rook cluster for each service.
As all service level backups will work like this, and back up to the same
S3 service, it makes sense to centralize the configuration, instead of copying
it into every service backup CRD.
This configuration happens in the s3BackupConfig
:
s3BackupConfig:
type: object
description: "Configuration for S3 access to the backup buckets."
properties:
s3Host:
type: string
description: "The S3 server hosting the backup buckets."
s3Credentials:
type: object
description: "The S3 credentials for the backup S3 user."
properties:
secretName:
type: string
description: "The name of the Secret containing the credentials."
accessKeyIDProperty:
type: string
description: "The name of the property in the secretName secret with the AWS_ACCESS_KEY_ID"
secretKeyProperty:
type: string
description: "The name of the property in the secretName secret with the AWS_SECRET_ACCESS_KEY"
Pretty important to me is the flexibility when it comes to what the k8s Secrets
have to look like. I’ve been annoyed with some of the Helm charts I’ve been using
for prescribing exactly what the properties in the Secret need to be named,
so I introduced a config option here to not only define the Secret’s name, but
also the name of the property for the access and secret keys for the S3
credentials.
The s3ServiceConfig
has the same structure, but will be used for the
credentials for accessing the S3 buckets of services, instead of the S3 backup
buckets.
The resticPasswordSecret
is the configuration of the restic password to
unlock the restic encryption keys.
Finally, there’s the jobSpec
. This will likely still change in the future,
as I have not yet implemented that part. This spec will be used to create the
Jobs which
will run the actual backup. One will be created for each of the
HomelabServiceBackup instances I will describe next. I will not go into detail
on this part of the CRD today and instead keep it until I’ve actually implemented
the Job creation.
Then there’s the HomelabServiceBackup CRD:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: homelabservicebackups.mei-home.net
spec:
scope: Namespaced
group: mei-home.net
names:
kind: HomelabServiceBackup
plural: homelabservicebackups
singular: homelabservicebackup
versions:
- name: v1alpha1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
description: "This object describes the configuration of the backups for a specific service."
properties:
spec:
type: object
properties:
backupBucketName:
type: string
description: "The name of the S3 bucket to which the backup should be made."
backups:
type: array
description: "The elements, like PVCs and S3 buckets to back up for this service."
items:
type: object
properties:
type:
type: string
description: "The Type of the element, either s3 or pvc."
enum:
- s3
- pvc
name:
type: string
description: "The name of the element, either the name of an S3 bucket or a PVC"
status:
type: object
description: "Status of this service backup"
properties:
nextBackup:
type: string
description: "Date and time of the next backup run"
lastBackup:
type: object
description: "Status of latest backup"
properties:
state:
type: integer
description: "State of the last backup. 1: Failed, 0: Successful"
timestamp:
type: string
description: "Date and time the last backup run was executed"
This CRD describes the backups to be done for an individual service. It contains two main parts, the status and the spec. In the spec, I’m configuring the S3 bucket to be used for the backup, and a list of things to back up. Right now, I’ve only got PersistentVolumeClaims and S3 buckets in mind. An instantiation might look like this:
apiVersion: mei-home.net/v1alpha1
kind: HomelabServiceBackup
metadata:
name: test-service-backup
namespace: backup-tests
labels:
homelab/part-of: hlbo
spec:
backupBucketName: "non-existant-bucket"
backups:
- type: pvc
name: non-existant-pvc
- type: pvc
name: another-non-existant-pvc
- type: s3
name: non-existant-S3-bucket
Kopf overview
Kopf has a relatively nice approach to listening for changes to resources it is supposed to be watching. It makes use of Kubernetes’ watch API. And then it combines some Kubernetes events to provide a nicer interface than could be provided when just using plain events.
The main method are event handlers for a small group of events. These handlers can be defined for each of four different event categories:
- Creation of a new resource
- Resume of the handler for an already existing resource after an operator restart
- Deletion of a resource
- Change of a resource
In addition, there are daemons, which are long running handlers. Instead of running to completion for every event, they stay active from the moment a resource is created to the moment it is deleted. They are automatically started up after operator restarts as well.
Finally, there is a generic event handler, which does get the full firehose of Kubernetes events, without the nice provisioning of diffs and the like you get for kopf’s event category handlers.
The handlers are Python functions with a decorator which describes the event group it should listen on and the CRD it should listen for. Those handlers can also be combined, so you can have the same Python function handling both, creation of a new resource and resume after the operator restarts.
Handlers generally come in two flavors, using threads or using coroutines. I spontaneously decided to go with the coroutine approach, because I had never before used Python’s asyncio feature, but I was familiar with coroutines in C and C++.
Handling the HomelabBackupConfig CRD
There isn’t too much to do with the generic handling for this CRD. There is only ever supposed to be one of those, and the only thing which needs to be done with it is to store it in memory in the operator and make it available to the handlers of the HomelabServiceBackup CRD, so they can use the configs to launch their job.
The implementation of the handlers themselves I kept pretty simple:
import asyncio
import kopf
import hl_backup_operator.homelab_backup_config as backupconf
@kopf.on.startup()
async def create_backup_config_cond(memo, **_):
memo.backup_conf_cond = asyncio.Condition()
@kopf.on.create('homelabbackupconfigs')
@kopf.on.resume('homelabbackupconfigs')
@kopf.on.update('homelabbackupconfigs')
async def create_resume_update_handler(spec, meta, memo, **kwargs):
await backupconf.handle_creation_and_change(meta["name"],
memo.backup_conf_cond, spec)
@kopf.on.delete('homelabbackupconfigs')
async def delete_handler(meta, **kwargs):
backupconf.handle_deletion(meta["name"])
This sets up a combined handler for creation, resumption and updates of the CRD. It also creates a Condition which I will later use in the HomelabServiceBackup handlers to notify them when the config changed.
The homelab_backup_config
module looks like this:
import datetime
import logging
import croniter
__CONFIG = None
async def handle_creation_and_change(name, cond, spec):
global __CONFIG
__CONFIG = spec
logging.info(f"Set backup config from {name} to: {spec}")
async with cond:
cond.notify_all()
def handle_deletion(name):
global __CONFIG
__CONFIG = None
logging.warning(f"Config {name} deleted. No backups will be scheduled!")
def get_config():
return __CONFIG
def get_next_service_time():
if not __CONFIG:
logging.error("Service schedule time requested, but no config present."
)
return None
if ("serviceBackup" not in __CONFIG
or "schedule" not in __CONFIG["serviceBackup"]):
logging.error("Config serviceBackup.schedule is missing.")
return None
now = datetime.datetime.now(datetime.timezone.utc)
return croniter.croniter(__CONFIG["serviceBackup"]["schedule"], now
).get_next(datetime.datetime)
def get_service_backup_spec():
if not __CONFIG or "serviceBackup" not in __CONFIG:
logging.error("Config serviceBackup is missing.")
return None
else:
return __CONFIG["serviceBackup"]
As I said, I kept it really simple.
This implementation stores the spec as received from the handler in a module
level variable __CONFIG
and then has a couple functions to make it available
to the rest of the operator.
The only really interesting part is the get_next_service_time
function. It
looks at the spec.serviceBackup.schedule
value, which is a string in cron
format, for example like this:
spec:
serviceBackup:
schedule: "30 18 * * *"
I decided to keep all times in UTC internally, just to prevent confusing myself. Instead of writing my own cron parser, I used croniter. It doesn’t just provide a parser for the cron format, but also provides a helper to get the time and date of the next scheduled execution, which I make use of here.
Implementing the HomelabServiceBackup handling
The HomelabServiceBackup resource describes the backup for an individual service. In the operator, it will ultimately need to launch a Job to run the backup of the configured PersistentVolumeClaims and S3 buckets belonging to the service.
The first thing I implemented was the waiting for the scheduled execution time of the backup. For this, I initially thought to use kopf’s timers, but quickly realized that those only allow for a fix interval. But I needed an adaptable wait, depending on the schedule configured on the HomelabBackupConfig. For that reason, I reached for kopf’s Daemons. These are long running handlers. One is created for each instance of the watched resource.
The handler function itself is again simple, as I just call a separate function in a module:
import asyncio
import kopf
import hl_backup_operator.homelab_service_backup as servicebackup
@kopf.on.startup()
async def create_backup_config_cond(memo, **_):
memo.backup_conf_cond = asyncio.Condition()
@kopf.daemon("homelabservicebackups", initial_delay=30)
async def service_backup_daemon(name, namespace, spec, memo, stopped, **_):
await servicebackup.homelab_service_daemon(name, namespace, spec, memo,
stopped)
The daemon will spend most of its time waiting, as it only needs to do something in two cases:
- When the scheduled time for a backup has arrived
- When the backup schedule changes
Let’s look at the second case first. This is the reason for the usage of the memo. The memo is a generic container handled by kopf and made available to all handlers. I’m creating a Condition during operator startup. Every daemon will wait on this condition, and the handler for HomelabBackupConfig updates will notify all waiters on that condition when the HomelabBackupConfig changes. This is necessary because the schedule is configured in the HomelabBackupConfig, so daemons might need to adjust their wait timer.
Here is what that waiting currently looks like:
class WakeupReason(Enum):
TIMER = auto()
SCHEDULE_UPDATE = auto()
async def cond_waiter(cond):
async with cond:
await cond.wait()
async def wait_for(waittime, update_condition):
cond_task = asyncio.create_task(cond_waiter(update_condition),
name="condwait")
sleep_task = asyncio.create_task(asyncio.sleep(waittime), name="sleepwait")
done, pending = await asyncio.wait([cond_task, sleep_task],
return_when=asyncio.FIRST_COMPLETED)
for p in pending:
p.cancel()
wake_reasons = []
for d in done:
if d.get_name() == "condwait":
wake_reasons.append(WakeupReason.SCHEDULE_UPDATE)
elif d.get_name() == "sleepwait":
wake_reasons.append(WakeupReason.TIMER)
return wake_reasons
As I’ve noted before, I’m using Python’s asyncio module, so instead of threads,
I’m using coroutines. Luckily, the Python standard library already provides the
means to wait for multiple tasks and even tell me which task is done waiting
when the function returns. So here, I’m creating two tasks. One is waiting on
the given waittime
. This is the difference between the current time and the
next scheduled backup, in seconds. The second one is waiting on the condition
I mentioned previously. This condition will be notified by the handler for the
HomelabBackupConfig when that resource changes. This is necessary because the
daemon might need to adjust its wait time if the schedule for backups has changed.
Finally, I’m checking which task finished waiting, and return a list of enums to tell the caller why it woke up, to take different actions.
Then there’s the main loop of the daemon:
async def homelab_service_daemon(name, namespace, spec, memo, stopped):
logging.info(f"Launching daemon for {namespace}/{name}.")
while not stopped:
logging.debug(f"In main loop of {namespace}/{name} with spec: {spec}")
next_run = backupconfig.get_next_service_time()
wait_time = next_run - datetime.datetime.now(datetime.timezone.utc)
await wait_for(wait_time.total_seconds(), memo.backup_conf_cond)
logging.info(f"Finished daemon for {namespace}/{name}.")
This doesn’t do much at the moment, as I haven’t implemented the backups
themselves yet. It runs in an endless loop, checking the stopped
variable,
which will be set to True
by kopf if the HomelabServiceBackup this daemon is
handling is deleted or the operator is stopped. Kopf will also throw a
CancelledError
into the coroutine in those cases, so the daemon will also be stopped when it
is currently waiting.
The waiting time is computed with the get_next_service_time
function I discussed
above.
Implementing status updates
The goal which triggered this blog post was me finally getting the scheduled triggering and updates of the HomelabServiceBackup’s status implemented, which was my first change of the cluster status via the operator.
My goal was to have each daemon update a field in its HomelabServiceBackup resource with the scheduled time of the next backup, which would ultimately look like this:
status:
nextBackup: "2024-05-25T18:30:00+00:00"
The status.nextBackup
field is what I was interested in setting. I first
looked at the Kubernetes Python Client,
but found that it did not support asyncio. But I quickly found
kubernetes_asyncio.
An interesting thing I learned while looking at these two libraries is that they
were, for the most part, not hand-written. Instead, they use the openapi-generator
to automatically generate the API code from the Kubernetes API definition. Which
is pretty cool to see, to be honest. It leads to boatloads of repeated code, but
the alternative of writing all that code by hand probably doesn’t bear thinking
about.
Of course, one of the downsides of using the Python API client was that it would not have API support for the CRDs I’ve written for my own cluster. Instead, I needed to use the generic CustomObjectsAPI.
Initially, because I wanted to specifically update the status of my resources, I looked at the patch_namespaced_custom_object_status API. But running that API against a resource which did not have the status set yet just returns a 404. It took me a long while to realize that the 404 was not due to an error on my end, but simply because the resource needed to have a status already for the status API to work.
So instead, I reached for the patch_namespaced_custom_object API. That, too, had a lot of issues. I initially thought I was the first person to use the Python API package for accessing custom objects. All the examples I could find stated that this should work:
import asyncio
from kubernetes_asyncio import client, config
from kubernetes_asyncio.client.api_client import ApiClient
from pprint import pprint
import json
async def main():
await config.load_kube_config()
async with ApiClient() as api:
mine = client.CustomObjectsApi(api)
res = await mine.patch_namespaced_custom_object("mei-home.net", "v1alpha1",
"backups", "homelabservicebackups", "test-service-backup",
body={"status":{"lastBackup": {"state":1, "timestamp":"foobar"}}}
)
pprint(res)
asyncio.run(main())
But it did not. Instead, I kept getting errors like this back:
kubernetes_asyncio.client.exceptions.ApiException: (415)
Reason: Unsupported Media Type
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure",
"message":"the body of the request was in an unknown format - accepted media types
include: application/json-patch+json, application/merge-patch+json,
application/apply-patch+yaml",
"reason":"UnsupportedMediaType",
"code":415}
I finally found this bug.
It seems to indicate that the issue is a wrong media type getting set in the
content-type
header. This lead me to the examples
file, which shows that a specific content type could be forced, by adding
_content_type='application/merge-patch+json'
as a parameter to the
patch_namespaced_custom_object
call. With that addition, I was finally able
to properly update the time for the next backup in the status, by adding these
lines to the homelab_service_daemon
function from before:
status_body = {
"status": {
"nextBackup": next_run.isoformat()
}
}
await kubeapi.patch_mei_home_custom_object(
namespace, kubeapi.HOMELABSERVICEBACKUP_PLURAL, name, status_body)
The patch_mei_home_custom_object
function is just a thin wrapper around
the patch_namespaced_custom_object
function from above.
Some notes on testing
Writing UTs was not always simple here. First of all, I needed to employ a lot of mocks to remove any attempted k8s cluster access. I’m seriously considering buying some additional Pis and setting up a test cluster. 😁
My first generic issue was: How do I even properly unit test asyncio code?
Luckily, that issue was easy to answer, at least in the abstract: I used
pytest-asyncio. It allows me to
add @pytest.mark.asyncio
at the top of my test function, or entire test classes,
and the pytest plugin will automatically setup the event loop infrastructure
and execute the test functions with it.
Still, I had a particular challenge with testing the waiting code, specifically when it comes to testing whether the Condition properly fires. As a reminder, here is what the code looks like:
async def cond_waiter(cond):
async with cond:
await cond.wait()
async def wait_for(waittime, update_condition):
cond_task = asyncio.create_task(cond_waiter(update_condition),
name="condwait")
sleep_task = asyncio.create_task(asyncio.sleep(waittime), name="sleepwait")
done, pending = await asyncio.wait([cond_task, sleep_task],
return_when=asyncio.FIRST_COMPLETED)
for p in pending:
p.cancel()
wake_reasons = []
for d in done:
if d.get_name() == "condwait":
wake_reasons.append(WakeupReason.SCHEDULE_UPDATE)
elif d.get_name() == "sleepwait":
wake_reasons.append(WakeupReason.TIMER)
return wake_reasons
And here is my initial attempt at the test code:
import asyncio
from unittest.mock import AsyncMock, Mock
import hl_backup_operator.homelab_service_backup as sut
@pytest.mark.asyncio
class TestCondWait:
async def test_cond_wait_works(self):
cond = asyncio.Condition()
test_task = asyncio.create_task(sut.wait_for(15, cond))
async with cond:
cond.notify_all()
await test_task
res = test_task.result()
assert res == [sut.WakeupReason.SCHEDULE_UPDATE]
I’m trying to test whether the Condition works properly. My thinking is that the code path goes like this:
- [testcode]: Creates an async task ready to run, executing the function under test.
- [appcode]: Runs until it hits the
asyncio.wait
line - [appcode]: Now waits for either the timer to expire or the Condition to be triggered, hands back execution to the [testcode]
- [testcode]: Executes the
cond.notify_all
function - [testcode]: Awaits the task, handing execution back to [appcode]
- [appcode]: Gets notified in
cond_waiter
and runs to completion
But that was not what happened. Sprinkling in some print
statements, I found
that the test code continues running after the create_task
call, straight
through the notify_call
call. The first time the wait_for gets to do anything
is when the test code hits the await test_task
line. And only then does it
reach the await cond.wait
line. But at this point, the test code already
executed the notify_all
, and the wait_for
function does not return until the
timer, of the sleepwait
task, is hit, resulting in a failed UT.
The only way I found around this issue is to have the test code explicitly hand
execution off. I did this by introducing a await asyncio.sleep(0.05)
before
the async with cond:
line of the test function.
Then the wait_for
function gets to run until it hits the await cond.wait
and
gets properly notified and the test reliably succeeds.
This was, yet again, a case where the UT ends up being more complicated than the actual code.
One more issue I hit had to do with the merciless advance of time. Have another
look at the homelab_service_daemon
function:
async def homelab_service_daemon(name, namespace, spec, memo, stopped):
logging.info(f"Launching daemon for {namespace}/{name}.")
while not stopped:
logging.debug(f"In main loop of {namespace}/{name} with spec: {spec}")
next_run = backupconfig.get_next_service_time()
wait_time = next_run - datetime.datetime.now(datetime.timezone.utc)
status_body = {
"status": {
"nextBackup": next_run.isoformat()
}
}
await kubeapi.patch_mei_home_custom_object(
namespace, kubeapi.HOMELABSERVICEBACKUP_PLURAL, name, status_body)
await wait_for(wait_time.total_seconds(), memo.backup_conf_cond)
logging.info(f"Finished daemon for {namespace}/{name}.")
It has to compute the waiting time as the difference between the current time
and the time of the next scheduled backup. But how to handle datetime.now
in
UTs? I initially tried to do this with a bit of fuzziness when comparing the
arguments handed to the mocked wait_for
with the expected wait time, but that
seemed a bit too brittle.
Freezegun to the rescue. It provides a
nice API to patch datetime.now
(and several other related functions) so that
it always returns a deterministic value.
Using it in a UT to verify that homelab_service_daemon
calls wait_for
as
expected could look like this:
@pytest.fixture()
def mock_wait_for(self, mocker):
wait_for_mock = AsyncMock(spec=sut.wait_for)
mocker.patch('hl_backup_operator.homelab_service_backup.wait_for',
side_effect=wait_for_mock)
return wait_for_mock
async def test_daemon_waits_correctly(self, mocker, mock_wait_for):
mock_memo = Mock()
mock_stopped = Mock()
mock_stopped_bool = Mock(side_effect=[False, True])
mock_stopped.__bool__ = mock_stopped_bool
time_now = datetime(year=2024, month=5, day=22, hour=19, minute=12,
second=10, tzinfo=timezone.utc)
time_trigger = datetime(year=2024, month=5, day=22, hour=19, minute=12,
second=12, tzinfo=timezone.utc)
mock_next_service_time = Mock(return_value=time_trigger)
mocker.patch(
'hl_backup_operator.homelab_backup_config.get_next_service_time',
side_effect=mock_next_service_time)
with freezegun.freeze_time(time_now):
await sut.homelab_service_daemon("tests", "testns", {}, mock_memo,
mock_stopped)
mock_wait_for.assert_awaited_once_with(2, mock_memo.backup_conf_cond)
I’m mocking away both, the wait_for
and get_next_service_time
functions,
and I’m also defining two fixed times, one “current” time, and one trigger time.
In the with freezegun.freeze_time(time_now)
context, datetime.now
will now
reliably always return time_now
instead of the actual current time. And with
that, I don’t need to rely on any fuzziness when testing time-related
functionality.
Next steps
After I’m finally happy with the groundwork, I still need to implement a couple
of features before starting with the implementation of the backup Jobs
themselves.
The first one is proper handling of the case where there is no HomelabBackupConfig
configured. Currently, the homelab_service_daemon
function would crash, because
get_next_service_time
would return None
, due to not having any configured
schedule. That is easily fixable by extending the waiting time to “forever”.
With the Condition mechanism already in place, the daemons will be woken up once
a HomelabBackupConfig appears and can then return to the right schedule.
The second feature currently missing is mostly for testing purposes. Right now, I’m only able to centrally set the schedule, which would be applicable for all service daemons. This is bound to become cumbersome once I want to start testing the Job creation and monitoring, so I will want the possibility to trigger a single service daemon’s backup immediately. I will likely introduce another parameter into the HomelabServiceBackup CRD which makes the daemon trigger a backup immediately.
Alright, that’s all I have to say for now. This is my first “programming” post on this blog, and I’m honestly not sure how it came out. Were you actually able to follow, or was it a confused mess? Was it actually interesting to read? I’d be glad for some feedback, e.g. via my Fediverse account.