Wherein I migrate my HashiCorp Vault instance to the Kubernetes cluster.
This is part 25 of my k8s migration series.
Look at all this Yak wool. That’s how much it takes to migrate Vault from baremetal to a Kubernetes deployment. I’ve been going back and forth for quite a while, trying to decide what to do with my Vault instance. It’s the one piece of HashiCorp software I do not currently plan to get rid of. But there was a problem: My Vault, or rather the High Availability nature of it, relied on HashiCorp’s Consul and its DNS service discovery functionality. And while I did want to keep Vault, I did not want to keep Consul. And I also didn’t really want to introduce some other sort of method, like HAProxy.
In the end, I sat down and thought quite hard for quite a while, mostly thinking about potential reasons for why I should not move Vault to the Kubernetes cluster. My main worry is bootstrapping - what happens if my entire Homelab goes down, unplanned, and all at once? Be it because I stumble over the absolutely wrong cable, or because my oven develops a short again and throws the main fuse. Could I still get my Homelab back up and do any massaging it might need?
I ended up deciding that Vault on Kubernetes should be fine. All Kubernetes Secrets are synced into the cluster anyway, and any other secrets I might need also live in my password manager. It should be fine. Watch this space for the day I find out what I overlooked. 😅
And thus began the Yak shaving.
Vault
But before we start onto that mountain of wool, let’s take a short detour and look at what Vault is and what I use it for. Brought down to the simplest terms, HashiCorp’s Vault is an API server for secrets of many, many different kinds. It supports everything from simple key-value secrets to PKI certificates. It can also serve short-lived tokens, including for HashiCorp’s other products like Consul or Nomad. I used it for a number of things over the years.
The most important part of it is the KV store for me. It stores all manner of passwords, keys and certificates, like my public cert. And it makes all of those available, given proper authorization, over HTTP. I use secrets from this store for my Ansible playbooks, the Mastodon secrets via external-secrets in my Kubernetes cluster and in my image generation setup for new hosts as well. Support for it is very widespread as well. In HashiCorp’s own tools of course, but also in other tools like Ansible, where you shouldn’t confuse it with Ansible’s own Vault secret store.
In the past, I also used the Nomad secrets engine to get a short-lived token for Nomad API access for my backup solution.
Another big use case for me is as an internal, self-signed CA. During my Nomad/Vault/Consul cluster days, this was pretty important functionality, because those self-signed certs were used by all three components of my Homelab to secure their HTTPS communication. I’ve even gone to the length of installing the CA on all of my devices, so I don’t get any untrusted certificate warnings when accessing services secured with that CA. Since the introduction of Kubernetes, I’m not using the Homelab CA quite as much, but there are still a few internal things secured with it.
For a short while, I even considered using Vault as my OIDC identity provider, but in the end I decided against it. My main reason for that was that I would have needed to hang my internal secret store into the public internet, because I intended to use OIDC for some public sites. And even though I’ve got no reason to distrust HashiCorp’s security practices, and I could have only made certain paths publicly accessible, I decided against it.
So what does working with Vault actually look like? The main interface is the Vault CLI executable. You can control anything you need from the command line. But it also provides a WebUI, if that’s more your cup of tea. I never bothered with it.
The first step of working with Vault is to obtain a token for all further tasks. For this, Vault offers a plethora of auth methods, ranging from the good old username/password to OIDC or TLS certs. I’m using the userpass method, which is just good old username+password. It’s comfortable for me, I can use my password manager and just copy+paste the password in. It looks something like this:
vault login -method=userpass username=myuser
Password (will be hidden):
Success! You are now authenticated. The token information displayed below
is already stored in the token helper. You do NOT need to run "vault login"
again. Future Vault requests will automatically use this token.
Key Value
--- -----
token hvs.CAESII0RlV4BS_5_A2q8mIpzYxiye0XoE-_Vvlb0YIAYfl-6Gh4KHGh2cy5sSmpvZk5QMXN2QW0wZ0c0R1A3cXV3TkQ
token_accessor 5ofJhWq55yZGOk6CJVRyBacd
token_duration 4h
token_renewable true
token_policies ["admin" "default"]
identity_policies []
policies ["admin" "default"]
token_meta_username myuser
Don’t worry, this token has long since expired. 🙂
When you use vault login
, Vault automatically puts the received token into a
file in ~/.vault-token
. And the vault
CLI as well as other things with Vault
integration check that path as well.
As you’d expect from a properly secured application, the tokens you’re getting have a restricted TTL. How long a token is initially valid can be configured, in addition to enabling token renewal and defining an upper bound on how long a token can live under any circumstances.
Then there’s also the policies. Those define what the holder of a token can
actually do with it. In this case, I’m having the default
and admin
policies.
The default
policy mostly allows the holder to access information about the
token they’re using, while admin
is my admin policy, allowing full access to
Vault. It looks something like this:
path "sys/health"
{
capabilities = ["read", "sudo"]
}
# Create and manage ACL policies broadly across Vault
# List existing policies
path "sys/policies/acl"
{
capabilities = ["list"]
}
# Create and manage ACL policies
path "sys/policies/acl/*"
{
capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}
# Enable and manage authentication methods broadly across Vault
# Manage auth methods broadly across Vault
path "auth/*"
{
capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}
# Create, update, and delete auth methods
path "sys/auth/*"
{
capabilities = ["create", "update", "delete", "sudo"]
}
# List auth methods
path "sys/auth"
{
capabilities = ["read"]
}
# Enable and manage the key/value secrets engine at `secret/` path
# List, create, update, and delete key/value secrets
path "secret/*"
{
capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}
# Manage secrets engines
path "sys/mounts/*"
{
capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}
# Manage secrets engines
path "sys/remount"
{
capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}
# List existing secrets engines.
path "sys/mounts"
{
capabilities = ["read"]
}
# Homenet Root CA access
path "homenet-ca*" {
capabilities = [ "create", "read", "update", "delete", "list", "sudo" ]
}
Armed with this token, I can then for example take a look at my secrets:
vault read secret/s3_users/blog
Key Value
--- -----
refresh_interval 768h
access abcde
custom_metadata map[managed-by:external-secrets]
secret 12345
This is a pretty nice example, in fact. It shows that the blog
secret consists
of two entries, access
and secret
, containing the standard S3 credentials.
But it also has custom_metadata
indicating that it wasn’t actually created
by me by hand, but was pushed into Vault via an external-secrets PushSecret.
I’m doing this because I need the S3 credentials for my blog in both, an Ansible
playbook I use to configure S3 buckets, and in the K8s cluster, because that’s
where the bucket and credentials are created by Rook Ceph.
To put that same secret into Vault, the following command line could be used:
vault kv put secret/s3_users/blog access=abcde secret=12345
This would of course have the downside of putting the secret into the shell history, unless a space is added at the front. If you’d prefer having Vault take the secret from stdin, you can run the same command like this:
vault kv put secret/s3_users/blog access=abcde secret=-
This will take the access
key from the parameter, but for the secret
, it
will ask you for the value, which keeps it out of the shell history.
But this approach also has a downside, because only one key can be used with
the -
as input.
If you have more actually secret parameters, you can also put all of them into
a JSON file. I will demonstrate that later on when I migrate my Vault content
from my baremetal instance to the Kubernetes deployment.
If you want to use Vault values from within Ansible, I’ve found the Vault lookup pretty nice to use. It can be used like this, to set a variable in a playbook:
- hosts: all
name: Demonstration
tags:
- demo
vars:
s3_access: "{{ lookup('hashi_vault', 'secret=secret/s3_users/blog:access token='+vault_token+' url='+vault_url) }}"
s3_secret: "{{ lookup('hashi_vault', 'secret=secret/s3_users/blog:secret token='+vault_token+' url='+vault_url) }}"
I’m setting the vault_token
with Ansible’s file lookup like this:
vault_token: "{{ lookup('file', '/home/my_user/.vault-token') }}"
And because that file is automatically updated when the vault login
command is
used, I’m getting the current token automatically.
I will go into a bit more detail about generating certificates later as part of the Vault k8s setup.
Setting up the Helm chart
Alright. Let the Yak shaving finally commence. First of all, it’s notable that there is no official way to migrate the content of an instance to another instance. So I had to go with setting up a completely new instance of Vault on k8s, instead of doing some sort of migration.
So the first step was to configure and deploy the official Helm chart, following this guide.
And here is the result:
global:
enabled: true
tlsDisable: false
openshift: false
serverTelemetry:
prometheusOperator: false
injector:
enabled: false
server:
enabled: true
logLevel: debug
logFormat: json
resources:
requests:
memory: 500Mi
cpu: 500m
limits:
memory: 500Mi
ingress:
enabled: false
readinessProbe:
enabled: false
path: "/v1/sys/health?standbyok=true&sealedcode=204"
livenessProbe:
enabled: true
path: "/v1/sys/health?standbyok=true"
initialDelaySeconds: 600
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
nodeSelector:
homelab/role: "controller"
networkPolicy:
enabled: false
priorityClassName: "system-cluster-critical"
extraLabels:
homelab/app: vault
homelab/part-of: vault
service:
enabled: true
active:
enabled: false
standby:
enabled: false
type: "LoadBalancer"
externalTrafficPolicy: "Local"
annotations:
external-dns.alpha.kubernetes.io/hostname: newvault.example.com
io.cilium/lb-ipam-ips: 300.300.300.12
includeConfigAnnotation: true
dataStorage:
enabled: true
size: "1Gi"
storageClass: rbd-fast
auditStorage:
enabled: false
dev:
enabled: false
extraVolumes:
- type: secret
name: vault-tls-certs
extraEnvironmentVars:
VAULT_CACERT: "/vault/userconfig/vault-tls-certs/issuing_ca"
standalone:
enabled: false
ha:
enabled: true
raft:
enabled: true
setNodeId: true
config: |
cluster_name = "vault-k8s"
ui = false
disable_mlock = false
listener "tcp" {
address = "[::]:8200"
cluster_address = "[::]:8201"
tls_cert_file = "/vault/userconfig/vault-tls-certs/certificate"
tls_key_file = "/vault/userconfig/vault-tls-certs/private_key"
}
storage "raft" {
path = "/vault/data"
retry_join {
leader_api_addr = "https://vault-0.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/vault-tls-certs/issuing_ca"
}
retry_join {
leader_api_addr = "https://vault-1.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/vault-tls-certs/issuing_ca"
}
retry_join {
leader_api_addr = "https://vault-2.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/vault-tls-certs/issuing_ca"
}
}
service_registration "kubernetes" {}
replicas: 3
ui:
enabled: false
csi:
enabled: false
serverTelemetry:
serviceMonitor:
enabled: false
Let me explain. I’m disabling the Ingress because I will make Vault accessible via a LoadBalancer instead. There’s no need to push it through Traefik, and using it through Traefik would just mean one more service that needs to be up and running before Vault is accessible.
Next, I’m configuring the liveness probe. It needs to be reconfigured to make
sure that Vault also returns a 200
result when the pod being probed is in
standby. See also these docs.
And while setting Vault up, before initializing the cluster, just do yourself
the favor and completely disable the probes, or at least increase the initialDelaySeconds
,
to prevent restarts while you’re in the process of initializing the cluster.
Next, I’m adding a toleration for control-plane
nodes. This is mostly because
those are the only nodes with local storage, so they will be up first.
And then we come to the first problem with the chart, the service setup. In my
k8s cluster, I’m using Cilium’s BGP-based LoadBalancer support. And that requires
the Services Cilium looks at to have a specific, configurable label. But the
Vault Helm chart does not allow setting labels for the Services it creates.
Perhaps my use case is just really niche?
Anyway, I’m enabling only the generic vault
Service, set it to LoadBalancer
and, importantly, set the externalTrafficPolicy
to Local
. This means that
packets arriving for Vault will directly reach the node where the active Vault
Pod is running, instead of getting forwarded there by other nodes.
This is particularly important for Vault, because Vault can configure tokens
to be valid only when they’re coming from certain IPs. This won’t work when
the source IP looks like it’s coming from another k8s node, instead of the actual
source host.
I’m also setting a fixed IP to assign to the LoadBalancer, so I can easily set a
few choice firewall rules for access to that LoadBalancer.
After the Service configs follows the HA configuration. In this config, there can be multiple Vault servers, continuously exchanging data. When the currently active server goes down, another one can take over, and the previous leader goes into the standby pool once its back. Note that this is a high availability setup, not a load balancing setup. When a request reaches a standby server, it is forwarded to the current active server. The standby server never answers any requests, besides those to the health endpoint, of course. In my case, the HA config mostly consists of a config snippet for the Vault config file:
cluster_name = "vault-k8s"
ui = false
disable_mlock = false
listener "tcp" {
address = "[::]:8200"
cluster_address = "[::]:8201"
tls_cert_file = "/vault/userconfig/vault-tls-certs/certificate"
tls_key_file = "/vault/userconfig/vault-tls-certs/private_key"
}
storage "raft" {
path = "/vault/data"
retry_join {
leader_api_addr = "https://vault-0.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/vault-tls-certs/issuing_ca"
}
retry_join {
leader_api_addr = "https://vault-1.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/vault-tls-certs/issuing_ca"
}
retry_join {
leader_api_addr = "https://vault-2.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/vault-tls-certs/issuing_ca"
}
}
service_registration "kubernetes" {}
Most interesting here is the retry_join
configuration, which needs to contain
the CA used to sign the TLS cert used in the listener
stanza. I will explain
this more deeply in the next section, where I set up the cert generation.
Once that Helm chart gets deployed, a couple of things went wrong, leading to some beautiful Yak shaving.
Setting up the CiliumBGPPeeringPolicy
As I’ve noted above, the labels of the main Vault Service cannot be changed. Interestingly, the two other services, for active and standby servers, do have the option of configuring their labels. But not the main service. Another issue: The type of the Services can only be set centrally, for all three Services.
As you can read in a bit more detail here, I’m using Cilium’s BGP-based support for setting up LoadBalancer type services.
apiVersion: "cilium.io/v2alpha1"
kind: CiliumBGPPeeringPolicy
metadata:
name: worker-node-bgp
namespace: kube-system
spec:
nodeSelector:
matchExpressions:
- key: "homelab/role"
operator: In
values:
- "worker"
- "ceph"
virtualRouters:
- localASN: 64555
exportPodCIDR: false
serviceSelector:
matchExpressions:
- key: "homelab/public-service"
operator: In
values:
- "true"
neighbors:
- peerAddress: '300.300.300.405/32'
peerASN: 64555
eBGPMultihopTTL: 10
connectRetryTimeSeconds: 120
holdTimeSeconds: 90
keepAliveTimeSeconds: 30
gracefulRestart:
enabled: true
restartTimeSeconds: 120
The main problem here is the spec.virtualRouters[0].serviceSelector
, as it
only allows matching on labels - and I cannot influence the labels set on the
Vault Service. I then took a very close look at the Cilium docs and found out
that the selector can also select on the Service name and namespace. So I
tried extending the above config like this, adding another entry in the
virtualRouters
list:
apiVersion: "cilium.io/v2alpha1"
kind: CiliumBGPPeeringPolicy
metadata:
name: worker-node-bgp
namespace: kube-system
spec:
nodeSelector:
matchExpressions:
- key: "homelab/role"
operator: In
values:
- "worker"
- "ceph"
virtualRouters:
- localASN: 64555
exportPodCIDR: false
serviceSelector:
matchExpressions:
- key: "homelab/public-service"
operator: In
values:
- "true"
neighbors:
- peerAddress: '300.300.300.405/32'
peerASN: 64555
eBGPMultihopTTL: 10
connectRetryTimeSeconds: 120
holdTimeSeconds: 90
keepAliveTimeSeconds: 30
gracefulRestart:
enabled: true
restartTimeSeconds: 120
- localASN: 64555
exportPodCIDR: false
serviceSelector:
matchExpressions:
- key: "io.kubernetes.service.name"
operator: In
values:
- "vault"
- key: "io.kubernetes.service.namespace"
operator: In
values:
- "vault"
neighbors:
- peerAddress: '300.300.300.405/32'
peerASN: 64555
eBGPMultihopTTL: 10
connectRetryTimeSeconds: 120
holdTimeSeconds: 90
keepAliveTimeSeconds: 30
gracefulRestart:
enabled: true
restartTimeSeconds: 120
But, this did not work at all. Cilium announced either only the Services which
matched the first serviceSelector
or the ones matching the second, but never
both. When debugging this issue, you can use cilium bgp routes
to show which
routes cilium advertises to neighbors.
What did end up working was to introduce two peering policies. The hosts they apply to seem to also have to avoid any overlap, or it again won’t work. I’ve got it configured like this now:
apiVersion: "cilium.io/v2alpha1"
kind: CiliumBGPPeeringPolicy
metadata:
name: worker-node-bgp
namespace: kube-system
spec:
nodeSelector:
matchExpressions:
- key: "homelab/role"
operator: In
values:
- "worker"
- "ceph"
virtualRouters:
- localASN: 64555
exportPodCIDR: false
serviceSelector:
matchExpressions:
- key: "homelab/public-service"
operator: In
values:
- "true"
neighbors:
- peerAddress: '300.300.300.405/32'
peerASN: 64555
eBGPMultihopTTL: 10
connectRetryTimeSeconds: 120
holdTimeSeconds: 90
keepAliveTimeSeconds: 30
gracefulRestart:
enabled: true
restartTimeSeconds: 120
---
apiVersion: "cilium.io/v2alpha1"
kind: CiliumBGPPeeringPolicy
metadata:
name: controller-node-bgp
namespace: kube-system
spec:
nodeSelector:
matchExpressions:
- key: "homelab/role"
operator: In
values:
- "controller"
virtualRouters:
- localASN: 64555
exportPodCIDR: false
serviceSelector:
matchExpressions:
- key: "io.kubernetes.service.name"
operator: In
values:
- "vault"
- key: "io.kubernetes.service.namespace"
operator: In
values:
- "vault"
neighbors:
- peerAddress: '300.300.300.405/32'
peerASN: 64555
eBGPMultihopTTL: 10
connectRetryTimeSeconds: 120
holdTimeSeconds: 90
keepAliveTimeSeconds: 30
gracefulRestart:
enabled: true
restartTimeSeconds: 120
So one policy does the normal thing, for all of the Services where I can control
the labels, running on my Ceph and worker hosts.
Then there is the second policy, which only applies to the vault
service in
the vault
namespace. With that configuration, I got Cilium to announce both
of them.
Setting up the certificates for Vault
The next step, accounting for the majority of preparation work, was the certificate
setup. The Vault Pods need to be able to access each other, and they should do
it over HTTPS. So they need certificates. Initially, I did not realize that and
naively just told Vault to use my Let’s Encrypt external cert. But of course
the Vault instances need to contact each other directly, not just the active
server available via the newvault.example.com
address.
So I needed a specific certificate for Vault, with the following three SANs:
vault-0.vault-internal
vault-1.vault-internal
vault-2.vault-internal
And as I noted above, I’ve already got an internal CA. And here is where I knowingly committed a sin: That CA is provided by Vault, via the PKI secrets engine. And I reused that CA for generating the Vault CA. Knowingly introducing a dependency cycle into my setup. I feel a bit ashamed for it. But I also don’t want to introduce another complete PKI setup. And reusing the already existing CA has the benefit that the CA cert is already widely deployed in my Homelab.
The first step is to set up a separate role for the certificate, so I can properly separate access to generating certificates later.
I’ve got all of my Vault configuration in Terraform, so I added the new role there as well. Here is the full CA setup:
resource "vault_mount" "my-ca-mount" {
path = "my-ca"
type = "pki"
default_lease_ttl_seconds = 157680000
max_lease_ttl_seconds = 157680000
}
resource "vault_pki_secret_backend_root_cert" "my-root-cert" {
backend = vault_mount.my-ca-mount.path
type = "internal"
common_name = "My Private Root CA"
ttl = 157680000
format = "pem"
private_key_format = "der"
key_type = "rsa"
key_bits = 4096
exclude_cn_from_sans = true
ou = "Private"
organization = "Private"
country = "DE"
}
resource "vault_pki_secret_backend_config_urls" "my-root-urls" {
backend = vault_mount.my-ca-mount.path
issuing_certificates = [
"https://vault.example.com/v1/my-ca/ca"
]
}
resource "vault_pki_secret_backend_role" "vault-certs" {
backend = vault_mount.my-ca-mount.path
name = "vault-certs"
ttl = "15552000"
max_ttl = "15552000"
allow_localhost = false
allowed_domains = [ "newvault.example.com", "vault-internal", "127.0.0.1" ]
allow_subdomains = true
allow_ip_sans = true
allow_wildcard_certificates = false
allow_bare_domains = true
key_type = "rsa"
key_bits = 4096
organization = ["My Homelab"]
country = ["DE"]
}
There’s also another role for certs deployed for other purposes, but that’s not
important here. I will not go over the base CA and mount configuration and
instead concentrate on the vault-certs
role.
This role allows creating certificates for the vault-internal
domain, covering
the three Pod’s internal DNS names, the externally visible address of the Vault
cluster that the LoadBalancer points to at newvault.example.com
and the
localhost address. The localhost IP is there because that’s how the vault
CLI
launched inside the Pods contacts the local Vault instance. That use case is
required for initialization and unsealing later.
This is the beauty of Terraform at work again. I could of course also do the same using the Vault CLI, but then I would need to document the commands somewhere and remember to update that documentation whenever something changes. With the Infrastructure as code approach, I would make any changes in this definition, so it’s always up-to-date.
After a terraform apply
, I could produce a certificate with this command:
vault write -format=json my-ca/issue/vault-certs common_name="newvault.example.com" alt_names="vault-0.vault-internal,vault-1.vault-internal,vault-2.vault-internal" ttl="4000h" > test.cert
The JSON file that produces looks like this:
{
"request_id": "5d72d050-8f80-ba4f-1067-b4165cf2d0f5",
"lease_id": "",
"lease_duration": 0,
"renewable": false,
"data": {
"ca_chain": [
"-----BEGIN CERTIFICATE-----\n[...]\n-----END CERTIFICATE-----"
],
"certificate": "-----BEGIN CERTIFICATE-----\n [...] \n-----END CERTIFICATE-----",
"expiration": 1757892246,
"issuing_ca": "-----BEGIN CERTIFICATE-----\n [...] \n-----END CERTIFICATE-----",
"private_key": "-----BEGIN RSA PRIVATE KEY-----\n[...]\n-----END RSA PRIVATE KEY-----",
"private_key_type": "rsa",
"serial_number": "7d:fe:7e:97:c1:56:96:eb:3d:27:e8:ee:48:78:82:bd:ca:f8:0d:7e"
},
"mount_type": "pki"
}
And I then revoked this test cert using the serial_number
with this command:
vault write my-ca/revoke serial_number="7d:fe:7e:97:c1:56:96:eb:3d:27:e8:ee:48:78:82:bd:ca:f8:0d:7e"
Getting the certificate into Kubernetes
I could then of course just upload the key and certificate into a k8s Secret, but that just doesn’t feel very Kubernetes-y, plus it would be a step I would need to document for future renewals. Instead, I had another look at external-secrets and found the VaultDynamicSecret.
This is another nice feature for getting Vault outputs into k8s Secrets, only this time it’s not static credentials, but a certificate, complete with automatic renewal. And the usage of the PKI secrets engine is even the example used in the docs.
I initially deployed a manifest that looked like this:
apiVersion: generators.external-secrets.io/v1alpha1
kind: VaultDynamicSecret
metadata:
name: "vault-certs-generator"
spec:
path: "my-ca/issue/vault-certs"
method: "POST"
parameters:
common_name: "newvault.example.com"
alt_names: "vault-0.vault-internal,vault-1.vault-internal,vault-2.vault-internal"
ip_sans: "127.0.0.1"
resultType: "Data"
provider:
server: "https://vault.example.com:8200"
caProvider:
type: Secret
name: homelab-ca-cert
namespace: {{ .Release.Namespace }}
key: caCert
auth:
appRole:
path: "approle"
roleId: {{ .Values.approleId }}
secretRef:
name: "external-secrets-approle"
namespace: {{ .Release.Namespace }}
key: "secretId"
I deployed this manifest in the external-secret’s namespace, because that was where the AppRole auth secrets lived.
Then I created the following ExternalSecret in the vault
namespace to generate
a certificate:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: "vault-tls-certs"
spec:
refreshInterval: "4000h"
target:
name: vault-tls-certs
dataFrom:
- sourceRef:
generatorRef:
apiVersion: generators.external-secrets.io/v1alpha1
kind: VaultDynamicSecret
name: "vault-certs-generator"
This didn’t work, and I got an error about the vault-certs-generator
not
being found. This was because the non-Cluster variants of external-secrets
objects are generally only available in the namespace where they were created.
So my ExternalSecret in the vault
namespace wasn’t able to access the
VaultDynamicSecret in the external-secrets namespace.
So I ended up moving the ExternalSecret into the external-secrets namespace as well, just to make sure that it even works. That introduced me to an authorization error looking something like this:
{"level":"error",
"ts":1742423406.4829333,
"msg":"Reconciler error",
"controller":"externalsecret",
"controllerGroup":"external-secrets.io",
"controllerKind":"ExternalSecret",
"ExternalSecret":{
"name":"vault-tls-certs",
"namespace":"external-secrets"
},
"namespace":"external-secrets",
"name":"vault-tls-certs",
"reconcileID":"d6b0b369-f959-479a-8228-f9a8d6fbc5bd",
"error":"error processing spec.dataFrom[0].sourceRef.generatorRef,
err: error using generator: Error making API request.\n\nURL:
PUT https://vault.example.com:8200/v1/my-ca/issue/vault-certs
Code: 403. Errors:\n\n* 1 error occurred:\n\t* permission denied\n\n",
"stacktrace":"..."}
This was due to a mistake I had made in updating the policy for external-secrets
to allow it access to the my-ca/issue/vault-certs
endpoint. The policy addition
I had made for that particular path looked like this:
path "my-ca/issue/vault-certs" {
capabilities = [ "create" ]
}
That’s what all the examples I could find said. But it did not work. I finally
added all permissions and then slowly removed the capabilities one by one,
until I arrived at only the update
capability missing from the above.
After fixing that I finally got a certificate created. But I still had to do something about the fact that the ExternalSecret lived in the external-secrets namespace now, while it was needed in Vault’s namespace.
One option I looked at to resolve this issue is ClusterGenerators. These work similar to namespaced generators like VaultDynamicSecret, but allow usage in ExternalSecrets throughout the cluster. I ended up deciding against that, for simple “doing things properly” reasons: The generator will only ever be needed in the Vault namespace, because it is not a generic generator for TLS certs, but a specific generator restricted to creating certs for Vault.
So I decided to stay with the namespaced VaultDynamicSecret, but change the auth method to Kubernetes.
Setting up Kubernetes auth
Being the Swiss army knife that it is, Vault can also authenticate with Kubernetes. The way this works is that you can create a role in Vault and assign policies defining what that role can do. Then, certain Kubernetes ServiceAccounts can be allowed to authenticate with that role. Vault then expects to receive a Kubernetes JWT token to verify the authentication, which it then contacts Kubernetes for to ensure the token is valid and belongs to one of the ServiceAccounts allowed to use the Vault role.
One problem is that the action of verifying Kubernetes tokens itself also needs
a Kubernetes token for the API server access. With a Vault deployed via the
Helm chart that’s easy, the vault
ServiceAccount created by the chart already
has the necessary permissions, and Vault can use that account’s token.
Vault will also automatically reload the token periodically, as Kubernetes
tokens are generally short-lived.
But at least initially, I need to use my baremetal Vault for the certificate
generation, because those are the certs that the k8s Vault deployment will use
later. To work around this issue, one could still use long-lived tokens. But
another way would be to use the JWT of the process that’s trying to use the
Vault auth method. This requires some changes in Kubernetes though. Namely, the
ServiceAccounts which should validate to Vault via Kubernetes auth need to have
the system:auth-delegator
ClusterRole. This allows the ServiceAccount’s token
to be used by other apps (here, Vault) to authenticate with that token. Vault
can use this to access the TokenReview
API to verify that the token is valid.
No change is necessary here, because the vault
ServiceAccount I will be using
already has the auth-delegator
role.
So with that out of the way, here is the Vault Kubernetes auth setup:
resource "vault_auth_backend" "kubernetes" {
type = "kubernetes"
}
resource "vault_kubernetes_auth_backend_config" "kube-backend-config" {
backend = vault_auth_backend.kubernetes.path
kubernetes_host = "https://k8s.exmaple.com:6443"
issuer = "api"
disable_iss_validation = "true"
kubernetes_ca_cert = "-----BEGIN CERTIFICATE-----\n [...]"
}
resource "vault_kubernetes_auth_backend_role" "vault-certs" {
backend = vault_auth_backend.kubernetes.path
role_name = "vault-certs"
bound_service_account_names = ["vault"]
bound_service_account_namespaces = ["vault"]
token_ttl = 3600
token_policies = [vault_policy.vault-certs.name]
token_bound_cidrs = ["300.300.300.0/24"]
}
In this setup, the Kubernetes auth is created for the k8s API at k8s.example.com:6443
.
The general k8s info can be found via this command:
kubectl cluster-info
Kubernetes control plane is running at https://k8s.example.com:6443
CoreDNS is running at https://k8s.example.com:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
To get the kubernetes_ca_cert
, you can have a look at the kube-root-ca.crt
ConfigMap that should be available in all namespaces, for example like this:
kubectl get -n kube-system configmaps kube-root-ca.crt -o jsonpath="{['data']['ca\.crt']}"
Finally, I’Ve also restricted all tokens created by the vault-certs
role so
that they’re only valid coming from IPs in the Homelab. That’s just a small
defense in depth method I like to apply for any tokens in Vault where it’s possible.
Finally setting up the certificate generator
With the authentication now configured properly, the certificate generation can be set up like this:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: "vault-tls-certs"
spec:
refreshInterval: "4000h"
target:
name: vault-tls-certs
dataFrom:
- sourceRef:
generatorRef:
apiVersion: generators.external-secrets.io/v1alpha1
kind: VaultDynamicSecret
name: "vault-certs-generator"
---
apiVersion: generators.external-secrets.io/v1alpha1
kind: VaultDynamicSecret
metadata:
name: "vault-certs-generator"
spec:
path: "my-ca/issue/vault-certs"
method: "POST"
parameters:
common_name: "newvault.example.com"
alt_names: "vault-0.vault-internal,vault-1.vault-internal,vault-2.vault-internal"
ip_sans: "127.0.0.1"
resultType: "Data"
provider:
server: "https://vault.example.com:8200"
caProvider:
type: Secret
name: homelab-ca-cert
namespace: vault
key: caCert
auth:
kubernetes:
mountPath: "kubernetes"
role: "vault-certs"
serviceAccountRef:
name: "vault"
With this configuration, the Vault certs are collected (still from the old
baremetal Vault), with the Kubernetes authentication using the vault
ServiceAccount.
This now works without issue, and a certificate usable by the k8s Vault instance
is generated.
I’m also setting the renewal time of the Secret containing the certificate to 4000 hours. This should lead to automatic renewal with quite some time to spare, as the certificates are given a lifetime of 4400h.
One thing to note is that the VaultDynamicSecret also needs the CA certificate.
The way I’m currently supplying that one is a bit hacky. I’m deploying Vault
with a Helm chart, and I’ve added this to the values.yaml
file:
caBundle: |
{{- exec "curl" (list "https://vault.example.com:8200/v1/my-ca/ca/pem") | nindent 2 }}
This is a special functionality of the tool I’ve been using to manage all of
the Helm charts in my cluster, Helmfile.
It can interpret Go templates in the values.yaml
file. That line fetches the
CA certificate from the Vault endpoint and stores it in the caBundle
variable.
That is then used to create a Secret with the CA like this:
apiVersion: v1
kind: Secret
metadata:
name: homelab-ca-cert
stringData:
caCert: |
{{- .Values.caBundle | nindent 6 }}
Initializing Vault
With all those Yaks safely shaven, I could finally go forward with initializing the Kubernetes Vault cluster.
I used this command:
kubectl exec -n vault vault-0 -- vault operator init -key-shares=1 -key-threshold=1 > vault-init.txt
This initialization failed with a certificate error:
Get "https://127.0.0.1:8200/v1/sys/seal-status": tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs
Even for local connections, Vault needs a cert. And that’s why I’ve got the
127.0.0.1
IP SAN in the certificate used by Vault.
After I got that issue fixed, I finally successfully initialized the Vault instance, resulting in this information:
Unseal Key 1: abcde123
Initial Root Token: hvs.foobar
Vault initialized with 1 key shares and a key threshold of 1. Please securely
distribute the key shares printed above. When the Vault is re-sealed,
restarted, or stopped, you must supply at least 1 of these keys to unseal it
before it can start servicing requests.
Vault does not store the generated root key. Without at least 1 keys to
reconstruct the root key, Vault will remain permanently sealed!
It is possible to generate new unseal keys, provided you have a quorum of
existing unseal keys shares. See "vault operator rekey" for more information.
In the initialization command, I told Vault that I only need one key share. Normally, you would split the key into multiple shares so they can be distributed, but that doesn’t make any real sense for a small personal instance. If somebody somehow gets one of the key shares, they would very likely be able to get the others the same way.
It is very important to save the initial root token hvs.foobar
. This is
needed for the initial configuration, until some policies and other auth
methods have been configured.
The next step was then to unseal all three Vault instances with these commands
and the unseal key output by the vault init
command:
kubectl exec -it -n vault vault-0 -- vault operator unseal
kubectl exec -it -n vault vault-1 -- vault operator unseal
kubectl exec -it -n vault vault-2 -- vault operator unseal
One interesting thing to note: All of the Vault Pods, as configured by the Helm chart, run with the OnDelete update strategy. This has the effect that no change to the configuration, including e.g. setting new environment variables, will do anything. The Pods always need to be deleted manually to make a change.
Configuring Vault logging
I like having my logs all in at least approximately the same format, and so
I’ve got a log parsing section for most apps in my FluentD config. Normally I
don’t mention this, but Vault is a little bit weird. Namely, it does output
its logs as JSON if so configured, which is good. It makes parsing a lot simpler.
But, it also adds an @
symbol to the names of most of the JSON keys:
{"@level":"info","@message":"compacting logs","@module":"storage.raft","@timestamp":"2025-04-06T18:11:31.197956Z","from":893122,"to":901345}
{"@level":"info","@message":"snapshot complete up to","@module":"storage.raft","@timestamp":"2025-04-06T18:11:31.235460Z","index":911585}
And I’ve got no idea why. Or how it decides which keys get the @
and which
do not. It made my log parsing a little bit more complicated. It now looks
like this:
# Log config for the Vault deployment
<filter services.vault.vault>
@type parser
key_name log
reserve_data true
remove_key_name_field true
<parse>
@type multi_format
<pattern>
format json
time_key "@timestamp"
time_type string
time_format %iso8601
utc true
</pattern>
<pattern>
format regexp
expression /^(?<msg>.*)$/
time_key nil
</pattern>
</parse>
</filter>
<filter services.vault.vault>
@type record_modifier
remove_keys _dummy_,@level
<record>
_dummy_ ${record["level"] = record["@level"] if record.key?("@level")}
</record>
</filter>
The first filter does the main parsing, while the second one specifically
removes the @
in front of the level
entry in the log object, because
that’s the key where my setup expects to see the log level.
Another weird thing, where Vault is by far not the biggest offender, are apps
which log in multiple different formats. That’s why the first filter has a
multi_format
parser. For reasons I’m not sure about, Vault outputs some general
information in the beginning of the log, during startup, where it doesn’t respect
the log format configuration:
==> Vault server configuration:
Administrative Namespace:
Api Address: https://10.8.1.61:8200
Cgo: disabled
Cluster Address: https://vault-0.vault-internal:8201
Environment Variables: HOME, HOSTNAME, HOST_IP, KUBERNETES_PORT, KUBERNETES_PORT_443_TCP, KUBERNETES_PORT_443_TCP_ADDR, KUBERNETES_PORT_443_TCP_PORT, KUBERNETES_PORT_443_TCP_PROTO, KUBERNETES_SERVICE_HOST, KUBERNETES_SERVICE_PORT, KUBERNETES_SERVICE_PORT_HTTPS, NAME, PATH, POD_IP, PWD, SHLVL, SKIP_CHOWN, SKIP_SETCAP, TERM, VAULT_ADDR, VAULT_API_ADDR, VAULT_CACERT, VAULT_CLUSTER_ADDR, VAULT_K8S_NAMESPACE, VAULT_K8S_POD_NAME, VAULT_LOG_FORMAT, VAULT_LOG_LEVEL, VAULT_PORT, VAULT_PORT_8200_TCP, VAULT_PORT_8200_TCP_ADDR, VAULT_PORT_8200_TCP_PORT, VAULT_PORT_8200_TCP_PROTO, VAULT_PORT_8201_TCP, VAULT_PORT_8201_TCP_ADDR, VAULT_PORT_8201_TCP_PORT, VAULT_PORT_8201_TCP_PROTO, VAULT_RAFT_NODE_ID, VAULT_SERVICE_HOST, VAULT_SERVICE_PORT, VAULT_SERVICE_PORT_HTTPS, VAULT_SERVICE_PORT_HTTPS_INTERNAL, VERSION
Go Version: go1.23.6
Listener 1: tcp (addr: "[::]:8200", cluster address: "[::]:8201", disable_request_limiter: "false", max_request_duration: "1m30s", max_request_size: "33554432", tls: "enabled")
Log Level: debug
Mlock: supported: true, enabled: false
Recovery Mode: false
Storage: raft (HA available)
Version: Vault v1.18.5, built 2025-02-24T09:40:28Z
Version Sha: 2cb3755273dbd63f5b0f8ec50089b57ffd3fa330
==> Vault server started! Log data will stream in below:
Why output that in plain text, instead of also putting it into JSON? It seems to be a quirk of all of HashiCorp’s tools, Nomad and Consul also do the same thing if I remember correctly.
Migrating to the new Vault instance
With the instance on Kubernetes now configured, I need to migrate the data to that instance. Sadly, there’s not really a good way to migrate especially K/V store entries from one Vault to another. So I just went with manual migration.
Running Terraform against the new instance
As I’ve mentioned before, I’m using Terraform for a lot of the configuration for Vault, because that is preferable to keeping a list of commands in my docs.
But the issue was: I also needed to keep the configuration for the old Vault instance, because I needed to keep that one running during the migration as well.
So I started out with just adding the second Vault as another provider to my Terraform config, via provider aliases. It looked like this:
provider "vault" {
address = "https://vault.example.com:8200"
}
provider "vault" {
alias = "k8s"
address = "https://newvault.example.com:8200"
}
This allows me to keep configurations for two Vault instances in the same Terraform
state. I initially only created the userpass
auth method for the new Vault,
to verify that the Terraform setup worked:
resource "vault_auth_backend" "userpass" {
type = "userpass"
path = "userpass"
local = false
}
resource "vault_auth_backend" "userpass-k8s" {
provider = vault.k8s
type = "userpass"
path = "userpass"
local = false
}
With the provider
setting, I could choose which Vault provider config I wanted
to use.
But trying a terraform apply
with this configuration resulted in an error:
│ Error: failed to lookup token, err=Error making API request.
│
│ URL: GET https://vault.example.com:8200/v1/auth/token/lookup-self
│ Code: 403. Errors:
│
│ * 2 errors occurred:
│ * permission denied
│ * invalid token
This confused me - until I remembered that I had configured the Vault root token
for the new k8s Vault in the terminal I was running the command. Running my
customary vault login -method=userpass username=myuser
on another shell and
executing the terraform apply
of course also didn’t work, because now it had
only the Vault token needed for the old Vault instance.
A quick look into the Vault Terraform provider documentation
lead to the solution. I could configure one provider with a filepath
to a token file. That would be the provider for the old Vault instance. Then
I would leave the provider for the new Vault unconfigured, which would mean that
it would continue to use the VAULT_TOKEN
environment variable. The resulting
provider config looked like this:
provider "vault" {
address = "https://vault.example.com:8200"
auth_login_token_file {
filename = "/home/myuser/.vault-token"
}
}
provider "vault" {
alias = "k8s"
address = "https://newvault.example.com:8200"
}
I would then first run the login for the old provider:
vault login -method=userpass username=myuser
Then, in the same terminal, I would set the VAULT_TOKEN
variable to the root
token of the new Vault:
export VAULT_TOKEN="hvs.foobar"
And with that, I was able to run terraform apply
without issue, and both
Vault instances were configurable.
Next, I needed to be able to stop using the root token for the new instance and
instead create a userpass
login for that one as well. This, I needed to do on
the command line, because the Terraform resource
that needs to be used to create a userpass
user requires the password as part
of the Terraform resources, and I really did not want that. So I created it
on the command line:
vault write auth/userpass/users/myuser password=- token_policies=admin token_ttl=4h token_max_ttl=4h token_bound_cidrs=300.300.300.12 token_type="default"
This will create the user myuser
in the userpass
backend and will ask for
the password on the command line. Tokens issued by this auth method for the user
will be valid for four hours and will only be valid when used from the 300.300.300.12
source IP, which is my Command & Control host.
Now, instead of exporting the root token in the VAULT_TOKEN
variable, I could
issue this command to instead get a token for the myuser
role:
export VAULT_TOKEN=$(vault login -method=userpass -token-only username=myuser)
Migrating K/V secrets
After I had the userpass
login configured, I could just copy+paste all of the
Terraform resources for my Vault setup, add the provider = vault.k8s
option,
and one terraform apply
later, most configuration was migrated to the new
Vault instance on Kubernetes.
The only problem were the K/V secrets. Those are not in Terraform, because that would have required me to put my secrets into the Terraform config files and the Terraform state. After searching around a little, it looked like there was no official way to run a migration of K/V secrets, so I came up with my own.
First, I would export the data
field, which contains the actual secrets, as
opposed to some metadata, from the old Vault:
vault kv get -field data -format json secret/topsecret/database-creds > out.json
That would give me a JSON file like this:
{
"username": "foo",
"password": "bar"
}
That could then be imported into the new Vault like this:
vault kv put secret/topsecret/database-creds @out.json
I did that exactly 59 times, and all of my secrets were successfully migrated over.
Update playbook changes
Another interesting piece of code I would like to talk about is my Homelab host update Ansible playbook. This playbook runs updates of the host OS, Ubuntu server in my case, including automatic reboots and k8s node drains. But I would need to manually unseal the Vault Pods once their host was updated and rebooted. For that, I’m just having an Ansible task outputting the command I can copy+paste into another terminal to do the unseal.
This was pretty simple up to now, with the baremetal Vault, because I could directly contact the host being updated, because the Vault instance on there would be the one which needs the unseal. But, with Vault in k8s, there’s no obvious way to determine which of the three Vault Pods ran on the host currently being updated. I needed an approach to find the right container.
The first step is to wait for the local Vault Pod on the rebooted machine to come up again, so that it would even accept the unseal command. I did that with the following task:
- name: wait for vault to be running
tags:
- kubernetes
delegate_to: candchost
become_user: candcuser
kubernetes.core.k8s_info:
kind: Pod
namespace: vault
label_selectors:
- app.kubernetes.io/name=vault
- app.kubernetes.io/instance=vault
field_selectors:
- "spec.nodeName={{ ansible_hostname }}"
wait: true
wait_condition:
status: "True"
type: "Ready"
wait_sleep: 10
wait_timeout: 300
register: vault_pod_list
This task uses the Kubernetes Ansible collection
to have Ansible wait for the Vault Pod to be in Ready
state. I’m also saving the list
of discovered Vault Pods in a variable for later use. This task would only wait
for the Vault Pod on the host currently being updated, via the field selector.
Short aside: This also taught me that I could do the following:
kubectl get pods -A --field-selector "spec.nodeName=mynode"
Instead of kubectl get pods -A -o wide | grep mynode
. After over a year of
running Kubernetes in my Homelab. 🤦
But let’s move on. I now had the name of the Vault Pod on the rebooted host
in the vault_pod_list
variable, which allowed me to output a command line
I could copy+paste to unseal the Vault instance:
- name: unseal vault prompt
tags:
- vault
pause:
echo: true
prompt: "Please unseal vault: kubectl exec -it -n vault {{ vault_pod_list.resources[0].metadata.name }} -- vault operator unseal"
This is a pretty convenient way to integrate manual operations into an Ansible
play and works quite well. I see this prompt, copy the line and unseal the Pod,
and then I just hit <Return>
in the shell where Ansible is running and the
play will continue.
Switching the certs over to the new Vault’s CA
If you remember from further up (and I won’t be mad if you don’t, looking at the length of this post…), I was using the baremetal Vault instance to generate the certificates for the new Vault instance. But this also meant that those certs were relying on the old Vault’s CA.
The first step was to update the CA cert in the k8s Secret used for the
VaultDynamicSecret for the Vault certificate, which I did by changing the
line in my values.yaml
file fetching the CA:
{{- exec "curl" (list "-k" "https://newvault:8200/v1/my-ca/ca/pem") | nindent 2 }}
This did not have any direct effect on anything. The certificate Secret has a TTL of 4000 hours, so won’t try to recreate the certs anytime soon. At the same time, Vault won’t automatically reload a new CA either, so everything was fine.
Then I went into the VaultDynamicSecret and updated the Vault URL from the old to the new Vault. This regenerated the Vault certificates. But again, Vault itself doesn’t react to that, so Vault was still up and running without issue.
Then I send SIGHUP
to each Vault instance in turn, which triggered a configuration
reload, including fresh certificates.
kubectl exec -it -n vault vault-0 -- sh
kill -SIGHUP $(pidof vault)
And that’s it. I added annotations to a couple of ExternalSecrets to trigger refreshes to make sure it all worked, and it did, external-secrets successfully got the secrets from the new Vault k8s instance.
This was quite a lot more work than I thought, but it was also the second-to-last part of the migration. Now, the only thing still missing is to migrate the control plane off of the VMs on my extension host and onto the three Raspberry Pi 4 which previously served as controller nodes and are now empty, thanks to the baremetal Vault having been shut down.
But it’s Monday evening now, and the controller migration is more a weekend task, because it also includes moving the MONs of the Rook Ceph cluster, and that will need some full cluster restarts.