Nomad to k8s, Part 24: Migrating Vault to Kubernetes

Wherein I migrate my HashiCorp Vault instance to the Kubernetes cluster.

This is part 25 of my k8s migration series.

Look at all this Yak wool. That’s how much it takes to migrate Vault from baremetal to a Kubernetes deployment. I’ve been going back and forth for quite a while, trying to decide what to do with my Vault instance. It’s the one piece of HashiCorp software I do not currently plan to get rid of. But there was a problem: My Vault, or rather the High Availability nature of it, relied on HashiCorp’s Consul and its DNS service discovery functionality. And while I did want to keep Vault, I did not want to keep Consul. And I also didn’t really want to introduce some other sort of method, like HAProxy.

In the end, I sat down and thought quite hard for quite a while, mostly thinking about potential reasons for why I should not move Vault to the Kubernetes cluster. My main worry is bootstrapping - what happens if my entire Homelab goes down, unplanned, and all at once? Be it because I stumble over the absolutely wrong cable, or because my oven develops a short again and throws the main fuse. Could I still get my Homelab back up and do any massaging it might need?

I ended up deciding that Vault on Kubernetes should be fine. All Kubernetes Secrets are synced into the cluster anyway, and any other secrets I might need also live in my password manager. It should be fine. Watch this space for the day I find out what I overlooked. 😅

And thus began the Yak shaving.

Vault

But before we start onto that mountain of wool, let’s take a short detour and look at what Vault is and what I use it for. Brought down to the simplest terms, HashiCorp’s Vault is an API server for secrets of many, many different kinds. It supports everything from simple key-value secrets to PKI certificates. It can also serve short-lived tokens, including for HashiCorp’s other products like Consul or Nomad. I used it for a number of things over the years.

The most important part of it is the KV store for me. It stores all manner of passwords, keys and certificates, like my public cert. And it makes all of those available, given proper authorization, over HTTP. I use secrets from this store for my Ansible playbooks, the Mastodon secrets via external-secrets in my Kubernetes cluster and in my image generation setup for new hosts as well. Support for it is very widespread as well. In HashiCorp’s own tools of course, but also in other tools like Ansible, where you shouldn’t confuse it with Ansible’s own Vault secret store.

In the past, I also used the Nomad secrets engine to get a short-lived token for Nomad API access for my backup solution.

Another big use case for me is as an internal, self-signed CA. During my Nomad/Vault/Consul cluster days, this was pretty important functionality, because those self-signed certs were used by all three components of my Homelab to secure their HTTPS communication. I’ve even gone to the length of installing the CA on all of my devices, so I don’t get any untrusted certificate warnings when accessing services secured with that CA. Since the introduction of Kubernetes, I’m not using the Homelab CA quite as much, but there are still a few internal things secured with it.

For a short while, I even considered using Vault as my OIDC identity provider, but in the end I decided against it. My main reason for that was that I would have needed to hang my internal secret store into the public internet, because I intended to use OIDC for some public sites. And even though I’ve got no reason to distrust HashiCorp’s security practices, and I could have only made certain paths publicly accessible, I decided against it.

So what does working with Vault actually look like? The main interface is the Vault CLI executable. You can control anything you need from the command line. But it also provides a WebUI, if that’s more your cup of tea. I never bothered with it.

The first step of working with Vault is to obtain a token for all further tasks. For this, Vault offers a plethora of auth methods, ranging from the good old username/password to OIDC or TLS certs. I’m using the userpass method, which is just good old username+password. It’s comfortable for me, I can use my password manager and just copy+paste the password in. It looks something like this:

vault login -method=userpass username=myuser
Password (will be hidden):
Success! You are now authenticated. The token information displayed below
is already stored in the token helper. You do NOT need to run "vault login"
again. Future Vault requests will automatically use this token.

Key                    Value
---                    -----
token                  hvs.CAESII0RlV4BS_5_A2q8mIpzYxiye0XoE-_Vvlb0YIAYfl-6Gh4KHGh2cy5sSmpvZk5QMXN2QW0wZ0c0R1A3cXV3TkQ
token_accessor         5ofJhWq55yZGOk6CJVRyBacd
token_duration         4h
token_renewable        true
token_policies         ["admin" "default"]
identity_policies      []
policies               ["admin" "default"]
token_meta_username    myuser

Don’t worry, this token has long since expired. 🙂 When you use vault login, Vault automatically puts the received token into a file in ~/.vault-token. And the vault CLI as well as other things with Vault integration check that path as well.

As you’d expect from a properly secured application, the tokens you’re getting have a restricted TTL. How long a token is initially valid can be configured, in addition to enabling token renewal and defining an upper bound on how long a token can live under any circumstances.

Then there’s also the policies. Those define what the holder of a token can actually do with it. In this case, I’m having the default and admin policies. The default policy mostly allows the holder to access information about the token they’re using, while admin is my admin policy, allowing full access to Vault. It looks something like this:

path "sys/health"
{
  capabilities = ["read", "sudo"]
}

# Create and manage ACL policies broadly across Vault

# List existing policies
path "sys/policies/acl"
{
  capabilities = ["list"]
}

# Create and manage ACL policies
path "sys/policies/acl/*"
{
  capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}

# Enable and manage authentication methods broadly across Vault

# Manage auth methods broadly across Vault
path "auth/*"
{
  capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}

# Create, update, and delete auth methods
path "sys/auth/*"
{
  capabilities = ["create", "update", "delete", "sudo"]
}

# List auth methods
path "sys/auth"
{
  capabilities = ["read"]
}

# Enable and manage the key/value secrets engine at `secret/` path

# List, create, update, and delete key/value secrets
path "secret/*"
{
  capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}

# Manage secrets engines
path "sys/mounts/*"
{
  capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}

# Manage secrets engines
path "sys/remount"
{
  capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}

# List existing secrets engines.
path "sys/mounts"
{
  capabilities = ["read"]
}

# Homenet Root CA access
path "homenet-ca*" {
  capabilities = [ "create", "read", "update", "delete", "list", "sudo" ]
}

Armed with this token, I can then for example take a look at my secrets:

vault read secret/s3_users/blog
Key                 Value
---                 -----
refresh_interval    768h
access              abcde
custom_metadata     map[managed-by:external-secrets]
secret              12345

This is a pretty nice example, in fact. It shows that the blog secret consists of two entries, access and secret, containing the standard S3 credentials. But it also has custom_metadata indicating that it wasn’t actually created by me by hand, but was pushed into Vault via an external-secrets PushSecret. I’m doing this because I need the S3 credentials for my blog in both, an Ansible playbook I use to configure S3 buckets, and in the K8s cluster, because that’s where the bucket and credentials are created by Rook Ceph.

To put that same secret into Vault, the following command line could be used:

vault kv put secret/s3_users/blog access=abcde secret=12345

This would of course have the downside of putting the secret into the shell history, unless a space is added at the front. If you’d prefer having Vault take the secret from stdin, you can run the same command like this:

vault kv put secret/s3_users/blog access=abcde secret=-

This will take the access key from the parameter, but for the secret, it will ask you for the value, which keeps it out of the shell history. But this approach also has a downside, because only one key can be used with the - as input. If you have more actually secret parameters, you can also put all of them into a JSON file. I will demonstrate that later on when I migrate my Vault content from my baremetal instance to the Kubernetes deployment.

If you want to use Vault values from within Ansible, I’ve found the Vault lookup pretty nice to use. It can be used like this, to set a variable in a playbook:

- hosts: all
  name: Demonstration
  tags:
    - demo
  vars:
    s3_access: "{{ lookup('hashi_vault', 'secret=secret/s3_users/blog:access token='+vault_token+' url='+vault_url) }}"
    s3_secret: "{{ lookup('hashi_vault', 'secret=secret/s3_users/blog:secret token='+vault_token+' url='+vault_url) }}"

I’m setting the vault_token with Ansible’s file lookup like this:

vault_token: "{{ lookup('file', '/home/my_user/.vault-token') }}"

And because that file is automatically updated when the vault login command is used, I’m getting the current token automatically.

I will go into a bit more detail about generating certificates later as part of the Vault k8s setup.

Setting up the Helm chart

Alright. Let the Yak shaving finally commence. First of all, it’s notable that there is no official way to migrate the content of an instance to another instance. So I had to go with setting up a completely new instance of Vault on k8s, instead of doing some sort of migration.

So the first step was to configure and deploy the official Helm chart, following this guide.

And here is the result:

global:
  enabled: true
  tlsDisable: false
  openshift: false
  serverTelemetry:
    prometheusOperator: false

injector:
  enabled: false

server:
  enabled: true
  logLevel: debug
  logFormat: json
  resources:
    requests:
      memory: 500Mi
      cpu: 500m
    limits:
      memory: 500Mi
  ingress:
    enabled: false
  readinessProbe:
    enabled: false
    path: "/v1/sys/health?standbyok=true&sealedcode=204"
  livenessProbe:
    enabled: true
    path: "/v1/sys/health?standbyok=true"
    initialDelaySeconds: 600
  tolerations:
    - effect: NoSchedule
      key: node-role.kubernetes.io/control-plane
      operator: Exists
  nodeSelector:
    homelab/role: "controller"
  networkPolicy:
    enabled: false
  priorityClassName: "system-cluster-critical"
  extraLabels:
    homelab/app: vault
    homelab/part-of: vault
  service:
    enabled: true
    active:
      enabled: false
    standby:
      enabled: false
    type: "LoadBalancer"
    externalTrafficPolicy: "Local"
    annotations:
      external-dns.alpha.kubernetes.io/hostname: newvault.example.com
      io.cilium/lb-ipam-ips: 300.300.300.12
  includeConfigAnnotation: true
  dataStorage:
    enabled: true
    size: "1Gi"
    storageClass: rbd-fast
  auditStorage:
    enabled: false
  dev:
    enabled: false
  extraVolumes:
    - type: secret
      name: vault-tls-certs
  extraEnvironmentVars:
    VAULT_CACERT: "/vault/userconfig/vault-tls-certs/issuing_ca"
  standalone:
    enabled: false
  ha:
    enabled: true
    raft:
      enabled: true
      setNodeId: true
      config: |
        cluster_name = "vault-k8s"
        ui = false
        disable_mlock = false
        listener "tcp" {
          address = "[::]:8200"
          cluster_address = "[::]:8201"
          tls_cert_file = "/vault/userconfig/vault-tls-certs/certificate"
          tls_key_file  = "/vault/userconfig/vault-tls-certs/private_key"
        }

        storage "raft" {
          path = "/vault/data"
          retry_join {
            leader_api_addr = "https://vault-0.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-tls-certs/issuing_ca"
          }
          retry_join {
            leader_api_addr = "https://vault-1.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-tls-certs/issuing_ca"
          }
          retry_join {
            leader_api_addr = "https://vault-2.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-tls-certs/issuing_ca"
          }
        }

        service_registration "kubernetes" {}        
  replicas: 3
ui:
  enabled: false
csi:
  enabled: false
serverTelemetry:
  serviceMonitor:
    enabled: false

Let me explain. I’m disabling the Ingress because I will make Vault accessible via a LoadBalancer instead. There’s no need to push it through Traefik, and using it through Traefik would just mean one more service that needs to be up and running before Vault is accessible.

Next, I’m configuring the liveness probe. It needs to be reconfigured to make sure that Vault also returns a 200 result when the pod being probed is in standby. See also these docs. And while setting Vault up, before initializing the cluster, just do yourself the favor and completely disable the probes, or at least increase the initialDelaySeconds, to prevent restarts while you’re in the process of initializing the cluster.

Next, I’m adding a toleration for control-plane nodes. This is mostly because those are the only nodes with local storage, so they will be up first.

And then we come to the first problem with the chart, the service setup. In my k8s cluster, I’m using Cilium’s BGP-based LoadBalancer support. And that requires the Services Cilium looks at to have a specific, configurable label. But the Vault Helm chart does not allow setting labels for the Services it creates. Perhaps my use case is just really niche? Anyway, I’m enabling only the generic vault Service, set it to LoadBalancer and, importantly, set the externalTrafficPolicy to Local. This means that packets arriving for Vault will directly reach the node where the active Vault Pod is running, instead of getting forwarded there by other nodes. This is particularly important for Vault, because Vault can configure tokens to be valid only when they’re coming from certain IPs. This won’t work when the source IP looks like it’s coming from another k8s node, instead of the actual source host. I’m also setting a fixed IP to assign to the LoadBalancer, so I can easily set a few choice firewall rules for access to that LoadBalancer.

After the Service configs follows the HA configuration. In this config, there can be multiple Vault servers, continuously exchanging data. When the currently active server goes down, another one can take over, and the previous leader goes into the standby pool once its back. Note that this is a high availability setup, not a load balancing setup. When a request reaches a standby server, it is forwarded to the current active server. The standby server never answers any requests, besides those to the health endpoint, of course. In my case, the HA config mostly consists of a config snippet for the Vault config file:

cluster_name = "vault-k8s"
ui = false
disable_mlock = false
listener "tcp" {
  address = "[::]:8200"
  cluster_address = "[::]:8201"
  tls_cert_file = "/vault/userconfig/vault-tls-certs/certificate"
  tls_key_file  = "/vault/userconfig/vault-tls-certs/private_key"
}

storage "raft" {
  path = "/vault/data"
  retry_join {
    leader_api_addr = "https://vault-0.vault-internal:8200"
    leader_ca_cert_file = "/vault/userconfig/vault-tls-certs/issuing_ca"
  }
  retry_join {
    leader_api_addr = "https://vault-1.vault-internal:8200"
    leader_ca_cert_file = "/vault/userconfig/vault-tls-certs/issuing_ca"
  }
  retry_join {
    leader_api_addr = "https://vault-2.vault-internal:8200"
    leader_ca_cert_file = "/vault/userconfig/vault-tls-certs/issuing_ca"
  }
}
service_registration "kubernetes" {}

Most interesting here is the retry_join configuration, which needs to contain the CA used to sign the TLS cert used in the listener stanza. I will explain this more deeply in the next section, where I set up the cert generation.

Once that Helm chart gets deployed, a couple of things went wrong, leading to some beautiful Yak shaving.

Setting up the CiliumBGPPeeringPolicy

As I’ve noted above, the labels of the main Vault Service cannot be changed. Interestingly, the two other services, for active and standby servers, do have the option of configuring their labels. But not the main service. Another issue: The type of the Services can only be set centrally, for all three Services.

As you can read in a bit more detail here, I’m using Cilium’s BGP-based support for setting up LoadBalancer type services.

apiVersion: "cilium.io/v2alpha1"
kind: CiliumBGPPeeringPolicy
metadata:
  name: worker-node-bgp
  namespace: kube-system
spec:
  nodeSelector:
    matchExpressions:
      - key: "homelab/role"
        operator: In
        values:
          - "worker"
          - "ceph"
  virtualRouters:
    - localASN: 64555
      exportPodCIDR: false
      serviceSelector:
        matchExpressions:
          - key: "homelab/public-service"
            operator: In
            values:
              - "true"
      neighbors:
        - peerAddress: '300.300.300.405/32'
          peerASN: 64555
          eBGPMultihopTTL: 10
          connectRetryTimeSeconds: 120
          holdTimeSeconds: 90
          keepAliveTimeSeconds: 30
          gracefulRestart:
            enabled: true
            restartTimeSeconds: 120

The main problem here is the spec.virtualRouters[0].serviceSelector, as it only allows matching on labels - and I cannot influence the labels set on the Vault Service. I then took a very close look at the Cilium docs and found out that the selector can also select on the Service name and namespace. So I tried extending the above config like this, adding another entry in the virtualRouters list:

apiVersion: "cilium.io/v2alpha1"
kind: CiliumBGPPeeringPolicy
metadata:
  name: worker-node-bgp
  namespace: kube-system
spec:
  nodeSelector:
    matchExpressions:
      - key: "homelab/role"
        operator: In
        values:
          - "worker"
          - "ceph"
  virtualRouters:
    - localASN: 64555
      exportPodCIDR: false
      serviceSelector:
        matchExpressions:
          - key: "homelab/public-service"
            operator: In
            values:
              - "true"
      neighbors:
        - peerAddress: '300.300.300.405/32'
          peerASN: 64555
          eBGPMultihopTTL: 10
          connectRetryTimeSeconds: 120
          holdTimeSeconds: 90
          keepAliveTimeSeconds: 30
          gracefulRestart:
            enabled: true
            restartTimeSeconds: 120
    - localASN: 64555
      exportPodCIDR: false
      serviceSelector:
        matchExpressions:
          - key: "io.kubernetes.service.name"
            operator: In
            values:
              - "vault"
          - key: "io.kubernetes.service.namespace"
            operator: In
            values:
              - "vault"
      neighbors:
        - peerAddress: '300.300.300.405/32'
          peerASN: 64555
          eBGPMultihopTTL: 10
          connectRetryTimeSeconds: 120
          holdTimeSeconds: 90
          keepAliveTimeSeconds: 30
          gracefulRestart:
            enabled: true
            restartTimeSeconds: 120

But, this did not work at all. Cilium announced either only the Services which matched the first serviceSelector or the ones matching the second, but never both. When debugging this issue, you can use cilium bgp routes to show which routes cilium advertises to neighbors.

What did end up working was to introduce two peering policies. The hosts they apply to seem to also have to avoid any overlap, or it again won’t work. I’ve got it configured like this now:

apiVersion: "cilium.io/v2alpha1"
kind: CiliumBGPPeeringPolicy
metadata:
  name: worker-node-bgp
  namespace: kube-system
spec:
  nodeSelector:
    matchExpressions:
      - key: "homelab/role"
        operator: In
        values:
          - "worker"
          - "ceph"
  virtualRouters:
    - localASN: 64555
      exportPodCIDR: false
      serviceSelector:
        matchExpressions:
          - key: "homelab/public-service"
            operator: In
            values:
              - "true"
      neighbors:
        - peerAddress: '300.300.300.405/32'
          peerASN: 64555
          eBGPMultihopTTL: 10
          connectRetryTimeSeconds: 120
          holdTimeSeconds: 90
          keepAliveTimeSeconds: 30
          gracefulRestart:
            enabled: true
            restartTimeSeconds: 120
---
apiVersion: "cilium.io/v2alpha1"
kind: CiliumBGPPeeringPolicy
metadata:
  name: controller-node-bgp
  namespace: kube-system
spec:
  nodeSelector:
    matchExpressions:
      - key: "homelab/role"
        operator: In
        values:
          - "controller"
  virtualRouters:
    - localASN: 64555
      exportPodCIDR: false
      serviceSelector:
        matchExpressions:
          - key: "io.kubernetes.service.name"
            operator: In
            values:
              - "vault"
          - key: "io.kubernetes.service.namespace"
            operator: In
            values:
              - "vault"
      neighbors:
        - peerAddress: '300.300.300.405/32'
          peerASN: 64555
          eBGPMultihopTTL: 10
          connectRetryTimeSeconds: 120
          holdTimeSeconds: 90
          keepAliveTimeSeconds: 30
          gracefulRestart:
            enabled: true
            restartTimeSeconds: 120

So one policy does the normal thing, for all of the Services where I can control the labels, running on my Ceph and worker hosts. Then there is the second policy, which only applies to the vault service in the vault namespace. With that configuration, I got Cilium to announce both of them.

Setting up the certificates for Vault

The next step, accounting for the majority of preparation work, was the certificate setup. The Vault Pods need to be able to access each other, and they should do it over HTTPS. So they need certificates. Initially, I did not realize that and naively just told Vault to use my Let’s Encrypt external cert. But of course the Vault instances need to contact each other directly, not just the active server available via the newvault.example.com address.

So I needed a specific certificate for Vault, with the following three SANs:

vault-0.vault-internal
vault-1.vault-internal
vault-2.vault-internal

And as I noted above, I’ve already got an internal CA. And here is where I knowingly committed a sin: That CA is provided by Vault, via the PKI secrets engine. And I reused that CA for generating the Vault CA. Knowingly introducing a dependency cycle into my setup. I feel a bit ashamed for it. But I also don’t want to introduce another complete PKI setup. And reusing the already existing CA has the benefit that the CA cert is already widely deployed in my Homelab.

The first step is to set up a separate role for the certificate, so I can properly separate access to generating certificates later.

I’ve got all of my Vault configuration in Terraform, so I added the new role there as well. Here is the full CA setup:

resource "vault_mount" "my-ca-mount" {
  path = "my-ca"
  type = "pki"
  default_lease_ttl_seconds = 157680000
  max_lease_ttl_seconds = 157680000
}

resource "vault_pki_secret_backend_root_cert" "my-root-cert" {
  backend = vault_mount.my-ca-mount.path
  type = "internal"
  common_name = "My Private Root CA"
  ttl = 157680000
  format = "pem"
  private_key_format = "der"
  key_type = "rsa"
  key_bits = 4096
  exclude_cn_from_sans = true
  ou = "Private"
  organization = "Private"
  country = "DE"
}

resource "vault_pki_secret_backend_config_urls" "my-root-urls" {
  backend = vault_mount.my-ca-mount.path
  issuing_certificates = [
    "https://vault.example.com/v1/my-ca/ca"
  ]
}

resource "vault_pki_secret_backend_role" "vault-certs" {
  backend = vault_mount.my-ca-mount.path
  name = "vault-certs"
  ttl = "15552000"
  max_ttl = "15552000"
  allow_localhost = false
  allowed_domains = [ "newvault.example.com", "vault-internal", "127.0.0.1" ]
  allow_subdomains = true
  allow_ip_sans = true
  allow_wildcard_certificates = false
  allow_bare_domains = true
  key_type = "rsa"
  key_bits = 4096
  organization = ["My Homelab"]
  country = ["DE"]
}

There’s also another role for certs deployed for other purposes, but that’s not important here. I will not go over the base CA and mount configuration and instead concentrate on the vault-certs role.

This role allows creating certificates for the vault-internal domain, covering the three Pod’s internal DNS names, the externally visible address of the Vault cluster that the LoadBalancer points to at newvault.example.com and the localhost address. The localhost IP is there because that’s how the vault CLI launched inside the Pods contacts the local Vault instance. That use case is required for initialization and unsealing later.

This is the beauty of Terraform at work again. I could of course also do the same using the Vault CLI, but then I would need to document the commands somewhere and remember to update that documentation whenever something changes. With the Infrastructure as code approach, I would make any changes in this definition, so it’s always up-to-date.

After a terraform apply, I could produce a certificate with this command:

vault write -format=json my-ca/issue/vault-certs common_name="newvault.example.com" alt_names="vault-0.vault-internal,vault-1.vault-internal,vault-2.vault-internal" ttl="4000h" > test.cert

The JSON file that produces looks like this:

{
  "request_id": "5d72d050-8f80-ba4f-1067-b4165cf2d0f5",
  "lease_id": "",
  "lease_duration": 0,
  "renewable": false,
  "data": {
    "ca_chain": [
      "-----BEGIN CERTIFICATE-----\n[...]\n-----END CERTIFICATE-----"
    ],
    "certificate": "-----BEGIN CERTIFICATE-----\n [...] \n-----END CERTIFICATE-----",
    "expiration": 1757892246,
    "issuing_ca": "-----BEGIN CERTIFICATE-----\n [...] \n-----END CERTIFICATE-----",
    "private_key": "-----BEGIN RSA PRIVATE KEY-----\n[...]\n-----END RSA PRIVATE KEY-----",
    "private_key_type": "rsa",
    "serial_number": "7d:fe:7e:97:c1:56:96:eb:3d:27:e8:ee:48:78:82:bd:ca:f8:0d:7e"
  },
  "mount_type": "pki"
}

And I then revoked this test cert using the serial_number with this command:

vault write my-ca/revoke serial_number="7d:fe:7e:97:c1:56:96:eb:3d:27:e8:ee:48:78:82:bd:ca:f8:0d:7e"

Getting the certificate into Kubernetes

I could then of course just upload the key and certificate into a k8s Secret, but that just doesn’t feel very Kubernetes-y, plus it would be a step I would need to document for future renewals. Instead, I had another look at external-secrets and found the VaultDynamicSecret.

This is another nice feature for getting Vault outputs into k8s Secrets, only this time it’s not static credentials, but a certificate, complete with automatic renewal. And the usage of the PKI secrets engine is even the example used in the docs.

I initially deployed a manifest that looked like this:

apiVersion: generators.external-secrets.io/v1alpha1
kind: VaultDynamicSecret
metadata:
  name: "vault-certs-generator"
spec:
  path: "my-ca/issue/vault-certs"
  method: "POST"
  parameters:
    common_name: "newvault.example.com"
    alt_names: "vault-0.vault-internal,vault-1.vault-internal,vault-2.vault-internal"
    ip_sans: "127.0.0.1"
  resultType: "Data"
  provider:
    server: "https://vault.example.com:8200"
    caProvider:
      type: Secret
      name: homelab-ca-cert
      namespace: {{ .Release.Namespace }}
      key: caCert
    auth:
      appRole:
        path: "approle"
        roleId: {{ .Values.approleId }}
        secretRef:
          name: "external-secrets-approle"
          namespace: {{ .Release.Namespace }}
          key: "secretId"

I deployed this manifest in the external-secret’s namespace, because that was where the AppRole auth secrets lived.

Then I created the following ExternalSecret in the vault namespace to generate a certificate:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: "vault-tls-certs"
spec:
  refreshInterval: "4000h"
  target:
    name: vault-tls-certs
  dataFrom:
  - sourceRef:
      generatorRef:
        apiVersion: generators.external-secrets.io/v1alpha1
        kind: VaultDynamicSecret
        name: "vault-certs-generator"

This didn’t work, and I got an error about the vault-certs-generator not being found. This was because the non-Cluster variants of external-secrets objects are generally only available in the namespace where they were created. So my ExternalSecret in the vault namespace wasn’t able to access the VaultDynamicSecret in the external-secrets namespace.

So I ended up moving the ExternalSecret into the external-secrets namespace as well, just to make sure that it even works. That introduced me to an authorization error looking something like this:

{"level":"error",
"ts":1742423406.4829333,
"msg":"Reconciler error",
"controller":"externalsecret",
"controllerGroup":"external-secrets.io",
"controllerKind":"ExternalSecret",
"ExternalSecret":{
  "name":"vault-tls-certs",
  "namespace":"external-secrets"
},
"namespace":"external-secrets",
"name":"vault-tls-certs",
"reconcileID":"d6b0b369-f959-479a-8228-f9a8d6fbc5bd",
"error":"error processing spec.dataFrom[0].sourceRef.generatorRef,
err: error using generator: Error making API request.\n\nURL:
PUT https://vault.example.com:8200/v1/my-ca/issue/vault-certs
Code: 403. Errors:\n\n* 1 error occurred:\n\t* permission denied\n\n",
"stacktrace":"..."}

This was due to a mistake I had made in updating the policy for external-secrets to allow it access to the my-ca/issue/vault-certs endpoint. The policy addition I had made for that particular path looked like this:

path "my-ca/issue/vault-certs" {
  capabilities = [ "create" ]
}

That’s what all the examples I could find said. But it did not work. I finally added all permissions and then slowly removed the capabilities one by one, until I arrived at only the update capability missing from the above.

After fixing that I finally got a certificate created. But I still had to do something about the fact that the ExternalSecret lived in the external-secrets namespace now, while it was needed in Vault’s namespace.

One option I looked at to resolve this issue is ClusterGenerators. These work similar to namespaced generators like VaultDynamicSecret, but allow usage in ExternalSecrets throughout the cluster. I ended up deciding against that, for simple “doing things properly” reasons: The generator will only ever be needed in the Vault namespace, because it is not a generic generator for TLS certs, but a specific generator restricted to creating certs for Vault.

So I decided to stay with the namespaced VaultDynamicSecret, but change the auth method to Kubernetes.

Setting up Kubernetes auth

Being the Swiss army knife that it is, Vault can also authenticate with Kubernetes. The way this works is that you can create a role in Vault and assign policies defining what that role can do. Then, certain Kubernetes ServiceAccounts can be allowed to authenticate with that role. Vault then expects to receive a Kubernetes JWT token to verify the authentication, which it then contacts Kubernetes for to ensure the token is valid and belongs to one of the ServiceAccounts allowed to use the Vault role.

One problem is that the action of verifying Kubernetes tokens itself also needs a Kubernetes token for the API server access. With a Vault deployed via the Helm chart that’s easy, the vault ServiceAccount created by the chart already has the necessary permissions, and Vault can use that account’s token. Vault will also automatically reload the token periodically, as Kubernetes tokens are generally short-lived.

But at least initially, I need to use my baremetal Vault for the certificate generation, because those are the certs that the k8s Vault deployment will use later. To work around this issue, one could still use long-lived tokens. But another way would be to use the JWT of the process that’s trying to use the Vault auth method. This requires some changes in Kubernetes though. Namely, the ServiceAccounts which should validate to Vault via Kubernetes auth need to have the system:auth-delegator ClusterRole. This allows the ServiceAccount’s token to be used by other apps (here, Vault) to authenticate with that token. Vault can use this to access the TokenReview API to verify that the token is valid. No change is necessary here, because the vault ServiceAccount I will be using already has the auth-delegator role.

So with that out of the way, here is the Vault Kubernetes auth setup:

resource "vault_auth_backend" "kubernetes" {
  type = "kubernetes"
}

resource "vault_kubernetes_auth_backend_config" "kube-backend-config" {
  backend                = vault_auth_backend.kubernetes.path
  kubernetes_host        = "https://k8s.exmaple.com:6443"
  issuer                 = "api"
  disable_iss_validation = "true"
  kubernetes_ca_cert               = "-----BEGIN CERTIFICATE-----\n [...]"
}

resource "vault_kubernetes_auth_backend_role" "vault-certs" {
  backend                          = vault_auth_backend.kubernetes.path
  role_name                        = "vault-certs"
  bound_service_account_names      = ["vault"]
  bound_service_account_namespaces = ["vault"]
  token_ttl                        = 3600
  token_policies                   = [vault_policy.vault-certs.name]
  token_bound_cidrs                = ["300.300.300.0/24"]
}

In this setup, the Kubernetes auth is created for the k8s API at k8s.example.com:6443. The general k8s info can be found via this command:

kubectl cluster-info

Kubernetes control plane is running at https://k8s.example.com:6443
CoreDNS is running at https://k8s.example.com:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

To get the kubernetes_ca_cert, you can have a look at the kube-root-ca.crt ConfigMap that should be available in all namespaces, for example like this:

kubectl get -n kube-system configmaps kube-root-ca.crt -o jsonpath="{['data']['ca\.crt']}"

Finally, I’Ve also restricted all tokens created by the vault-certs role so that they’re only valid coming from IPs in the Homelab. That’s just a small defense in depth method I like to apply for any tokens in Vault where it’s possible.

Finally setting up the certificate generator

With the authentication now configured properly, the certificate generation can be set up like this:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: "vault-tls-certs"
spec:
  refreshInterval: "4000h"
  target:
    name: vault-tls-certs
  dataFrom:
  - sourceRef:
      generatorRef:
        apiVersion: generators.external-secrets.io/v1alpha1
        kind: VaultDynamicSecret
        name: "vault-certs-generator"
---
apiVersion: generators.external-secrets.io/v1alpha1
kind: VaultDynamicSecret
metadata:
  name: "vault-certs-generator"
spec:
  path: "my-ca/issue/vault-certs"
  method: "POST"
  parameters:
    common_name: "newvault.example.com"
    alt_names: "vault-0.vault-internal,vault-1.vault-internal,vault-2.vault-internal"
    ip_sans: "127.0.0.1"
  resultType: "Data"
  provider:
    server: "https://vault.example.com:8200"
    caProvider:
      type: Secret
      name: homelab-ca-cert
      namespace: vault
      key: caCert
    auth:
      kubernetes:
        mountPath: "kubernetes"
        role: "vault-certs"
        serviceAccountRef:
          name: "vault"

With this configuration, the Vault certs are collected (still from the old baremetal Vault), with the Kubernetes authentication using the vault ServiceAccount. This now works without issue, and a certificate usable by the k8s Vault instance is generated.

I’m also setting the renewal time of the Secret containing the certificate to 4000 hours. This should lead to automatic renewal with quite some time to spare, as the certificates are given a lifetime of 4400h.

One thing to note is that the VaultDynamicSecret also needs the CA certificate. The way I’m currently supplying that one is a bit hacky. I’m deploying Vault with a Helm chart, and I’ve added this to the values.yaml file:

caBundle: |
  {{- exec "curl" (list "https://vault.example.com:8200/v1/my-ca/ca/pem") | nindent 2 }}

This is a special functionality of the tool I’ve been using to manage all of the Helm charts in my cluster, Helmfile. It can interpret Go templates in the values.yaml file. That line fetches the CA certificate from the Vault endpoint and stores it in the caBundle variable. That is then used to create a Secret with the CA like this:

apiVersion: v1
kind: Secret
metadata:
  name: homelab-ca-cert
stringData:
  caCert: |
    {{- .Values.caBundle | nindent 6 }}

Initializing Vault

With all those Yaks safely shaven, I could finally go forward with initializing the Kubernetes Vault cluster.

I used this command:

kubectl exec -n vault vault-0 -- vault operator init -key-shares=1 -key-threshold=1 > vault-init.txt

This initialization failed with a certificate error:

Get "https://127.0.0.1:8200/v1/sys/seal-status": tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs

Even for local connections, Vault needs a cert. And that’s why I’ve got the 127.0.0.1 IP SAN in the certificate used by Vault.

After I got that issue fixed, I finally successfully initialized the Vault instance, resulting in this information:

Unseal Key 1: abcde123

Initial Root Token: hvs.foobar

Vault initialized with 1 key shares and a key threshold of 1. Please securely
distribute the key shares printed above. When the Vault is re-sealed,
restarted, or stopped, you must supply at least 1 of these keys to unseal it
before it can start servicing requests.

Vault does not store the generated root key. Without at least 1 keys to
reconstruct the root key, Vault will remain permanently sealed!

It is possible to generate new unseal keys, provided you have a quorum of
existing unseal keys shares. See "vault operator rekey" for more information.

In the initialization command, I told Vault that I only need one key share. Normally, you would split the key into multiple shares so they can be distributed, but that doesn’t make any real sense for a small personal instance. If somebody somehow gets one of the key shares, they would very likely be able to get the others the same way.

It is very important to save the initial root token hvs.foobar. This is needed for the initial configuration, until some policies and other auth methods have been configured.

The next step was then to unseal all three Vault instances with these commands and the unseal key output by the vault init command:

kubectl exec -it -n vault vault-0 -- vault operator unseal
kubectl exec -it -n vault vault-1 -- vault operator unseal
kubectl exec -it -n vault vault-2 -- vault operator unseal

One interesting thing to note: All of the Vault Pods, as configured by the Helm chart, run with the OnDelete update strategy. This has the effect that no change to the configuration, including e.g. setting new environment variables, will do anything. The Pods always need to be deleted manually to make a change.

Configuring Vault logging

I like having my logs all in at least approximately the same format, and so I’ve got a log parsing section for most apps in my FluentD config. Normally I don’t mention this, but Vault is a little bit weird. Namely, it does output its logs as JSON if so configured, which is good. It makes parsing a lot simpler. But, it also adds an @ symbol to the names of most of the JSON keys:

{"@level":"info","@message":"compacting logs","@module":"storage.raft","@timestamp":"2025-04-06T18:11:31.197956Z","from":893122,"to":901345}
{"@level":"info","@message":"snapshot complete up to","@module":"storage.raft","@timestamp":"2025-04-06T18:11:31.235460Z","index":911585}

And I’ve got no idea why. Or how it decides which keys get the @ and which do not. It made my log parsing a little bit more complicated. It now looks like this:

# Log config for the Vault deployment
<filter services.vault.vault>
  @type parser
  key_name log
  reserve_data true
  remove_key_name_field true
  <parse>
    @type multi_format
    <pattern>
      format json
      time_key "@timestamp"
      time_type string
      time_format %iso8601
      utc true
    </pattern>
    <pattern>
      format regexp
      expression /^(?<msg>.*)$/
      time_key nil
    </pattern>
  </parse>
</filter>

<filter services.vault.vault>
  @type record_modifier
  remove_keys _dummy_,@level
  <record>
    _dummy_ ${record["level"] = record["@level"] if record.key?("@level")}
  </record>
</filter>

The first filter does the main parsing, while the second one specifically removes the @ in front of the level entry in the log object, because that’s the key where my setup expects to see the log level.

Another weird thing, where Vault is by far not the biggest offender, are apps which log in multiple different formats. That’s why the first filter has a multi_format parser. For reasons I’m not sure about, Vault outputs some general information in the beginning of the log, during startup, where it doesn’t respect the log format configuration:

==> Vault server configuration:

Administrative Namespace:
             Api Address: https://10.8.1.61:8200
                     Cgo: disabled
         Cluster Address: https://vault-0.vault-internal:8201
   Environment Variables: HOME, HOSTNAME, HOST_IP, KUBERNETES_PORT, KUBERNETES_PORT_443_TCP, KUBERNETES_PORT_443_TCP_ADDR, KUBERNETES_PORT_443_TCP_PORT, KUBERNETES_PORT_443_TCP_PROTO, KUBERNETES_SERVICE_HOST, KUBERNETES_SERVICE_PORT, KUBERNETES_SERVICE_PORT_HTTPS, NAME, PATH, POD_IP, PWD, SHLVL, SKIP_CHOWN, SKIP_SETCAP, TERM, VAULT_ADDR, VAULT_API_ADDR, VAULT_CACERT, VAULT_CLUSTER_ADDR, VAULT_K8S_NAMESPACE, VAULT_K8S_POD_NAME, VAULT_LOG_FORMAT, VAULT_LOG_LEVEL, VAULT_PORT, VAULT_PORT_8200_TCP, VAULT_PORT_8200_TCP_ADDR, VAULT_PORT_8200_TCP_PORT, VAULT_PORT_8200_TCP_PROTO, VAULT_PORT_8201_TCP, VAULT_PORT_8201_TCP_ADDR, VAULT_PORT_8201_TCP_PORT, VAULT_PORT_8201_TCP_PROTO, VAULT_RAFT_NODE_ID, VAULT_SERVICE_HOST, VAULT_SERVICE_PORT, VAULT_SERVICE_PORT_HTTPS, VAULT_SERVICE_PORT_HTTPS_INTERNAL, VERSION
              Go Version: go1.23.6
              Listener 1: tcp (addr: "[::]:8200", cluster address: "[::]:8201", disable_request_limiter: "false", max_request_duration: "1m30s", max_request_size: "33554432", tls: "enabled")
               Log Level: debug
                   Mlock: supported: true, enabled: false
           Recovery Mode: false
                 Storage: raft (HA available)
                 Version: Vault v1.18.5, built 2025-02-24T09:40:28Z
             Version Sha: 2cb3755273dbd63f5b0f8ec50089b57ffd3fa330

==> Vault server started! Log data will stream in below:

Why output that in plain text, instead of also putting it into JSON? It seems to be a quirk of all of HashiCorp’s tools, Nomad and Consul also do the same thing if I remember correctly.

Migrating to the new Vault instance

With the instance on Kubernetes now configured, I need to migrate the data to that instance. Sadly, there’s not really a good way to migrate especially K/V store entries from one Vault to another. So I just went with manual migration.

Running Terraform against the new instance

As I’ve mentioned before, I’m using Terraform for a lot of the configuration for Vault, because that is preferable to keeping a list of commands in my docs.

But the issue was: I also needed to keep the configuration for the old Vault instance, because I needed to keep that one running during the migration as well.

So I started out with just adding the second Vault as another provider to my Terraform config, via provider aliases. It looked like this:

provider "vault" {
  address = "https://vault.example.com:8200"
}

provider "vault" {
  alias = "k8s"
  address = "https://newvault.example.com:8200"
}

This allows me to keep configurations for two Vault instances in the same Terraform state. I initially only created the userpass auth method for the new Vault, to verify that the Terraform setup worked:

resource "vault_auth_backend" "userpass" {
  type = "userpass"
  path = "userpass"
  local = false
}

resource "vault_auth_backend" "userpass-k8s" {
  provider = vault.k8s
  type = "userpass"
  path = "userpass"
  local = false
}

With the provider setting, I could choose which Vault provider config I wanted to use.

But trying a terraform apply with this configuration resulted in an error:

│ Error: failed to lookup token, err=Error making API request.
│
│ URL: GET https://vault.example.com:8200/v1/auth/token/lookup-self
│ Code: 403. Errors:
│
│ * 2 errors occurred:
│       * permission denied
│       * invalid token

This confused me - until I remembered that I had configured the Vault root token for the new k8s Vault in the terminal I was running the command. Running my customary vault login -method=userpass username=myuser on another shell and executing the terraform apply of course also didn’t work, because now it had only the Vault token needed for the old Vault instance.

A quick look into the Vault Terraform provider documentation lead to the solution. I could configure one provider with a filepath to a token file. That would be the provider for the old Vault instance. Then I would leave the provider for the new Vault unconfigured, which would mean that it would continue to use the VAULT_TOKEN environment variable. The resulting provider config looked like this:

provider "vault" {
  address = "https://vault.example.com:8200"
  auth_login_token_file {
    filename = "/home/myuser/.vault-token"
  }
}

provider "vault" {
  alias = "k8s"
  address = "https://newvault.example.com:8200"
}

I would then first run the login for the old provider:

vault login -method=userpass username=myuser

Then, in the same terminal, I would set the VAULT_TOKEN variable to the root token of the new Vault:

export VAULT_TOKEN="hvs.foobar"

And with that, I was able to run terraform apply without issue, and both Vault instances were configurable.

Next, I needed to be able to stop using the root token for the new instance and instead create a userpass login for that one as well. This, I needed to do on the command line, because the Terraform resource that needs to be used to create a userpass user requires the password as part of the Terraform resources, and I really did not want that. So I created it on the command line:

vault write auth/userpass/users/myuser password=- token_policies=admin token_ttl=4h token_max_ttl=4h token_bound_cidrs=300.300.300.12 token_type="default"

This will create the user myuser in the userpass backend and will ask for the password on the command line. Tokens issued by this auth method for the user will be valid for four hours and will only be valid when used from the 300.300.300.12 source IP, which is my Command & Control host.

Now, instead of exporting the root token in the VAULT_TOKEN variable, I could issue this command to instead get a token for the myuser role:

export VAULT_TOKEN=$(vault login -method=userpass -token-only username=myuser)

Migrating K/V secrets

After I had the userpass login configured, I could just copy+paste all of the Terraform resources for my Vault setup, add the provider = vault.k8s option, and one terraform apply later, most configuration was migrated to the new Vault instance on Kubernetes.

The only problem were the K/V secrets. Those are not in Terraform, because that would have required me to put my secrets into the Terraform config files and the Terraform state. After searching around a little, it looked like there was no official way to run a migration of K/V secrets, so I came up with my own.

First, I would export the data field, which contains the actual secrets, as opposed to some metadata, from the old Vault:

vault kv get -field data -format json secret/topsecret/database-creds > out.json

That would give me a JSON file like this:

{
  "username": "foo",
  "password": "bar"
}

That could then be imported into the new Vault like this:

vault kv put secret/topsecret/database-creds @out.json

I did that exactly 59 times, and all of my secrets were successfully migrated over.

Update playbook changes

Another interesting piece of code I would like to talk about is my Homelab host update Ansible playbook. This playbook runs updates of the host OS, Ubuntu server in my case, including automatic reboots and k8s node drains. But I would need to manually unseal the Vault Pods once their host was updated and rebooted. For that, I’m just having an Ansible task outputting the command I can copy+paste into another terminal to do the unseal.

This was pretty simple up to now, with the baremetal Vault, because I could directly contact the host being updated, because the Vault instance on there would be the one which needs the unseal. But, with Vault in k8s, there’s no obvious way to determine which of the three Vault Pods ran on the host currently being updated. I needed an approach to find the right container.

The first step is to wait for the local Vault Pod on the rebooted machine to come up again, so that it would even accept the unseal command. I did that with the following task:

- name: wait for vault to be running
  tags:
    - kubernetes
  delegate_to: candchost
  become_user: candcuser
  kubernetes.core.k8s_info:
    kind: Pod
    namespace: vault
    label_selectors:
      - app.kubernetes.io/name=vault
      - app.kubernetes.io/instance=vault
    field_selectors:
      - "spec.nodeName={{ ansible_hostname }}"
    wait: true
    wait_condition:
      status: "True"
      type: "Ready"
    wait_sleep: 10
    wait_timeout: 300
  register: vault_pod_list

This task uses the Kubernetes Ansible collection to have Ansible wait for the Vault Pod to be in Ready state. I’m also saving the list of discovered Vault Pods in a variable for later use. This task would only wait for the Vault Pod on the host currently being updated, via the field selector.

Short aside: This also taught me that I could do the following:

kubectl get pods -A --field-selector "spec.nodeName=mynode"

Instead of kubectl get pods -A -o wide | grep mynode. After over a year of running Kubernetes in my Homelab. 🤦

But let’s move on. I now had the name of the Vault Pod on the rebooted host in the vault_pod_list variable, which allowed me to output a command line I could copy+paste to unseal the Vault instance:

- name: unseal vault prompt
  tags:
    - vault
  pause:
    echo: true
    prompt: "Please unseal vault: kubectl exec -it -n vault {{ vault_pod_list.resources[0].metadata.name }} -- vault operator unseal"

This is a pretty convenient way to integrate manual operations into an Ansible play and works quite well. I see this prompt, copy the line and unseal the Pod, and then I just hit <Return> in the shell where Ansible is running and the play will continue.

Switching the certs over to the new Vault’s CA

If you remember from further up (and I won’t be mad if you don’t, looking at the length of this post…), I was using the baremetal Vault instance to generate the certificates for the new Vault instance. But this also meant that those certs were relying on the old Vault’s CA.

The first step was to update the CA cert in the k8s Secret used for the VaultDynamicSecret for the Vault certificate, which I did by changing the line in my values.yaml file fetching the CA:

{{- exec "curl" (list "-k" "https://newvault:8200/v1/my-ca/ca/pem") | nindent 2 }}

This did not have any direct effect on anything. The certificate Secret has a TTL of 4000 hours, so won’t try to recreate the certs anytime soon. At the same time, Vault won’t automatically reload a new CA either, so everything was fine.

Then I went into the VaultDynamicSecret and updated the Vault URL from the old to the new Vault. This regenerated the Vault certificates. But again, Vault itself doesn’t react to that, so Vault was still up and running without issue.

Then I send SIGHUP to each Vault instance in turn, which triggered a configuration reload, including fresh certificates.

kubectl exec -it -n vault vault-0 -- sh
kill -SIGHUP $(pidof vault)

And that’s it. I added annotations to a couple of ExternalSecrets to trigger refreshes to make sure it all worked, and it did, external-secrets successfully got the secrets from the new Vault k8s instance.

This was quite a lot more work than I thought, but it was also the second-to-last part of the migration. Now, the only thing still missing is to migrate the control plane off of the VMs on my extension host and onto the three Raspberry Pi 4 which previously served as controller nodes and are now empty, thanks to the baremetal Vault having been shut down.

But it’s Monday evening now, and the controller migration is more a weekend task, because it also includes moving the MONs of the Rook Ceph cluster, and that will need some full cluster restarts.

Vault#

Setting up the Helm chart#

Setting up the CiliumBGPPeeringPolicy#

Setting up the certificates for Vault#

Getting the certificate into Kubernetes#

Setting up Kubernetes auth#

Finally setting up the certificate generator#

Initializing Vault#

Configuring Vault logging#

Migrating to the new Vault instance#

Running Terraform against the new instance#

Migrating K/V secrets#

Update playbook changes#

Switching the certs over to the new Vault’s CA#