While working on the logging setup for my Grafana, Loki and CloudNativePG deployments, I found that there were some things I disliked about my original k8s logging setup, which I described here.
This is the start of a kind of post where I try to keep the reading time reasonably short. Whenever I prefix a post with “Bit Sized:”, you can expect a short one. I’m trying to wean myself off of the incredibly long, meandering posts I seem to keep putting out.
Hindsight: Well, at least I stayed under 10 minutes.
The trigger: CloudNativePG logs
I’m running CloudNativePG for my database needs in the new k8s cluster. Configuring Grafana for my metrics stack was the first time I deployed a production database with it. So obviously, I needed to get the logs properly ingested. To see what we’re dealing with, here are the first couple of logs lines from the Postgres Pod in a CloudNativePG deployment:
{"level":"info","ts":"2024-03-29T10:15:37Z","msg":"Starting workers","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","worker count":1}
{"level":"info","ts":"2024-03-29T10:15:37Z","logger":"postgres","msg":"2024-03-29 10:15:37.836 UTC [23] LOG: redirecting log output to logging collector process","pipe":"stderr","logging_pod":"grafana-pg-cluster-1"}
{"level":"info","ts":"2024-03-29T10:15:37Z","logger":"postgres","msg":"record","logging_pod":"grafana-pg-cluster-1","record":{"log_time":"2024-03-29 10:15:37.836 UTC","process_id":"23","session_id":"660694c9.17","session_line_num":"3","session_start_time":"2024-03-29 10:15:37 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"listening on IPv4 address \"0.0.0.0\", port 5432","backend_type":"postmaster","query_id":"0"}}
All three are JSON, which is nice already, meaning I won’t need any regexes.
The first line is the easiest to handle, produced by the management part of the
CloudNativePG Pod. The second is already a bit more complicated, because it
contains a full Postgres formatted logging message in it. From that line, I would
at least like to get the log level (LOG
in this case) as well as the timestamp.
The third line is then the most complicated, because it contains the Postgres
log in the record
key, now in JSON format as well.
But the above is not actually how those logs arrive in my FluentD instance. First of all, because I use cri-o, the first log line would look like this in the actual log file:
2024-03-29T11:15:37.627916025+01:00 stderr F {"level":"info","ts":"2024-03-29T10:15:37Z","msg":"Starting workers","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","worker count":1}
The prefix <time> <stream> <_p> <log>
is added to each log
line a container produces by cri-o, and I think this might be a standardized format
mandated for CRI implementations?
To ingest these logs and send them on to FluentD, I’m using Fluentbit, with its
Kubernetes plugin.
My mistake here was to leave the option Merge_Log
enabled. If the content of
<log>
is a JSON string, the Kubernetes plugin takes that JSON object and adds
its keys to the record directly. This means by the time one of the logs lines
above arrives at my FluentD instance, it would look like this:
{[...]
record="{\"log_time\"=>\"2024-03-29 22:55:27.814\" [...]
}
This is not JSON format, instead it’s nested keys format from FluentD. The issue
with this is that FluentD does not seem to have any good tools to do something
with an entire subkey. So I can’t do something specific with the record
key
here - I can only access specific subkeys, e.g. record['record']['log_time']
,
at least as long as I don’t want to go to Ruby. And I really, really don’t want
to.
Time!
This lead me to another issue: While scrolling through some logs in my Grafana
dashboard, I saw that pretty much all of them were prefixed with a time
key.
Which looks odd, because time
should not appear in a log record’s keys.
It turns out that that is the record key into which the CRI parser of Fluentbit
parses the <time>
field from the cri-o log line! And because I generally don’t
do anything with that key, preferring to use the timestamp from the actual
application log, I just left it in accidentally, polluting the log lines with
unnecessary information.
But, and here comes the real issue: I couldn’t just change this behavior. The cri-o multiline parser is embedded into Fluentbit, and not configurable. So even though Fluentbit allows to configure whether parsers keep the time field of the parsed log or not, it did not allow me to do so here.
Next possibility: Just drop the time
key in Fluentbit, before even sending
the log on to FluentD. But this also wasn’t possible, because of the Merge_Log
config of the Fluentbit Kubernetes plugin. This is due to some apps, e.g. Traefik
in my setup, producing JSON log lines and having a field called time
in those
logs.
So consequently, when a Traefik log line runs through the Kubernetes plugin,
it will end up with its time
field, not the original one from the cri-o parser,
being put at the apex level of the log record. Removing it would mean deleting
Traefik’s own timestamp, which I did not want.
Instead of investigating further, I decided to switch Merge_Log
off. This way,
Fluentbit never touches the actual log line produced by the app. It only parses
the CRI parts and slaps on some Kubernetes Pod labels, then sends the entire
enchilada on to FluentD for proper massaging.
This has the downside that now, I don’t get “parsed” JSON logs for free, but quite honestly, it isn’t that much work to add something like this for the apps which produce JSON formatted logs:
<filter services.monitoring.grafana>
@type parser
key_name log
reserve_data true
remove_key_name_field true
<parse>
@type json
time_key t
time_type string
time_format %iso8601
utc true
</parse>
</filter>
This FluentD config only touches the log
key of the incoming record, tries
to parse it as JSON, adding all JSON object keys to the record as keys and parses
the time from the t
key, dropping that key afterwards.
I find this somehow more comforting, because I can be sure that I will always
find the app’s original log output in the log
key and can then do whatever
parsing I want in FluentD, leaving Fluentbit mostly as a log forwarder/shipper.
I still had to drop the time
key from all incoming logs, but now I could be
sure that I was only removing the <time>
part from the CRI log file entry,
keeping the actual app log’s time
key.
Parsing CloudNativePG logs
Finally back to the CloudNativePG logs. The first step of the parsing process
is to parse the JSON log from the log
key:
<filter services.*.postgres>
@type parser
key_name log
reserve_data true
remove_key_name_field true
<parse>
@type json
time_key nil
</parse>
</filter>
I’m ignoring the time from that log, for now.
Then I have to broadly split the records into two categories. The logs coming from the cnpg management plane, and the actual Postgres logs. That’s done as follows:
<match services.*.postgres>
@type rewrite_tag_filter
<rule>
key record
pattern /^.+$/
tag pg-record.${tag}
label @PGRECORD
</rule>
<rule>
key record
pattern /^.+$/
tag pg-no-record.${tag}
invert true
label @PGNORECORD
</rule>
</match>
This checks whether the record (after parsing of log
) has a record
key,
and sends the records to different labels.
First comes the PGRECORD
label, which handles log lines which look like this:
{"level":"info","ts":"2024-03-29T10:15:37Z","logger":"postgres","msg":"record","logging_pod":"grafana-pg-cluster-1","record":{"log_time":"2024-03-29 10:15:37.836 UTC","process_id":"23","session_id":"660694c9.17","session_line_num":"3","session_start_time":"2024-03-29 10:15:37 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"listening on IPv4 address \"0.0.0.0\", port 5432","backend_type":"postmaster","query_id":"0"}}
They contain the JSON formatted Postgres log in the record
key.
<label @PGRECORD>
<filter **>
@type record_transformer
remove_keys record
enable_ruby true
<record>
json_record ${record["record"].to_json}
</record>
</filter>
<filter **>
@type parser
key_name json_record
reserve_data true
remove_key_name_field true
<parse>
@type json
time_key nil
</parse>
</filter>
<filter **>
@type record_transformer
remove_keys log_time,message,error_severity
<record>
ts ${record["log_time"]}
level ${record["error_severity"]}
msg ${record["message"]}
</record>
</filter>
<match pg-record.**>
@type relabel
@label @PGNORECORD
</match>
</label>
Okay, so one after the other. First, I need to transform the content of the
record
key back into JSON - because in the initial parsing of the log
key,
it would have been transformed into a FluentD nested key structure. Then I’m
parsing that again. The effect that this has is that now, all the
keys from log_record.record
are keys of the record itself, and the record
field has been removed.
Then I’m rewriting the names of a couple of keys. I do this so they’re similar to the log lines produced by the cnpg management app. After the Postgres log records pass through this pipeline, all the logs coming from the Postgres Pod, regardless of whether they come from Postgres itself or the management plane look approximately the same.
The unified logs are then further massaged by this pipeline:
<label @PGNORECORD>
<filter pg-no-record.**>
@type parser
key_name msg
reserve_data true
remove_key_name_field false
<parse>
@type multi_format
<pattern>
format regexp
expression /^(?<ts>[0-9\-]* [0-9\:\.]* [^\ ]+) [^\ ]* (?<pglvl>[^\ ]*) (?<msg>.*)$/
time_key nil
</pattern>
<pattern>
format regexp
expression /^(?<msg>.*)$/
time_key nil
</pattern>
</parse>
</filter>
<filter **>
@type parser
key_name ts
reserve_data true
remove_key_name_field true
<parse>
@type multi_format
<pattern>
format regexp
expression /^(?<logtime>[0-9\-]* [0-9\:\.]* [^\ ]+)$/
time_key logtime
time_type string
time_format %F %T.%N %Z
</pattern>
<pattern>
format regexp
expression /^(?<logtime>[0-9]{4}-[01][0-8]-[0-3][0-9]T[0-2][0-9]:[0-6][0-9]:[0-6][0-9].*)$/
time_key logtime
time_type string
time_format %iso8601
utc true
</pattern>
</parse>
</filter>
<match **>
@type rewrite_tag_filter
remove_tag_regexp /^pg-(no-)?record\./
<rule>
key msg
pattern /^.+$/
tag parsed.${tag}
label @K8S
</rule>
</match>
</label>
Again, going step by step. The first filter
config takes the msg
key
and parses it, with two potential regex parsers being tried. The first one
recognizes Postgres log lines in string format, like this one:
2024-03-29 10:15:37.836 UTC [23] LOG: redirecting log output to logging collector process
It properly parses the timestamp and level, which are the two most important
parts for me. All other content of the msg
key are left unparsed and are just
written back to the msg
key as-is.
Finally, I’m parsing the timestamp ts
, which looks different now, depending on
whether the log line came from Postgres or from cnpg. For Postgres, the timestamp
looks like this: 2024-03-29 10:15:37.836 UTC
, while for cnpg it is a properly
formatted ISO8601 date/time.
And finally I rewrite the tag to start with parsed
, which indicates in my
config that the record should skip all the parsing pipelines.
Endless loops - Fluentbit edition
Before ending for today, one thing needs an honorary mention: I build another endless loop! 🎉
This time, just by enabling Fluentbit’s debug log verbosity. Because then, it produces a log entry for every message forwarded to FluentD - including its own:
component=output:forward:forward.0 msg="send options records=0 chunk=''
The one advantage about having my rack right next to my desk: I heard the fans crank the RPMs right away. 😅
Now you know why I was talking about Eldritch Horrors and the log pipeline being the most complicated part of my Homelab. I actually planned to just tidy up the monitoring stack migration a bit over the long Easter weekend and write the blog post, so I could get started with the next k8s migration step this week. Instead this is what I decided to spend my long weekend on. 🤦