A gentle introduction to CockroachDB logging and Fluentd
As of version 21.1 (May 2021), CockroachDB ships with a new, powerful logging framework which greatly simplifies the organization and maintenance of logs, and eases operations, specifically around the areas of monitoring and alerting.
Log messages are clearly categorized, allowing you to route messages to select destinations. Furthermore, CockroachDB has been enhanced to log more events, improving its audit and monitoring capabilities.
Here's a quick overview of how the framework is built.
Overview
CockroachDB logs include details about notable cluster, node, and range-level events.
Each log message is composed of a payload and an envelope that contains event metadata (e.g., severity, date, timestamp, channel).
Certain notable events are reported using a structured format.
As an added bonus, these notable events are also copied to the table system.eventlog
.
Log messages are organized into logging channels and then routed through log sinks.
Each sink further processes and filters the messages before emitting them to destinations outside CockroachDB.
The mapping of channels to sinks, as well as the processing and filtering done by each sink, is configurable.
Channels
Below the list of all channels and a brief description.
Channel | Description |
---|---|
DEV | Uncategorized and debug messages. |
OPS | Process starts, stops, shutdowns, and crashes (if they can be logged); changes to cluster topology, such as node additions, removals, and decommissions. |
HEALTH | Resource usage; node-node connection events, including connection errors; up- and down-replication and range unavailability. |
STORAGE | Low-level storage logs |
SESSIONS | Client connections and disconnections; SQL authentication logins/attempts and session/query terminations. |
SQL_SCHEMA | Database, schema, table, sequence, view, and type creation; changes to table columns and sequence parameters. |
USER_ADMIN | Changes to users, roles, and authentication credentials. |
PRIVILEGES | Changes to privileges and object ownership. |
SENSITIVE_ACCESS | SQL audit events. |
SQL_EXEC | SQL statement executions and uncaught Go panic errors during SQL statement execution. |
SQL_PERF | SQL executions that impact performance, such as slow queries. |
Sinks
Channels output messages to sinks.
There are 3 types of sinks available:
- Log files
- Fluentd-compatible servers (Splunk)
stderr
Each sink can be configured with parameters. Below a list of the common parameters for all sink types.
Parameter | Description |
---|---|
filter |
Minimum severity log level |
format |
Log message format |
redact |
When true, enables automatic redaction of personally identifiable information (PII) from log messages. This ensures that sensitive data is not transmitted when collecting logs centrally or over a network. For details, see Redact logs. |
redactable |
When true, preserves redaction markers around fields that are considered sensitive in the log messages. The markers are recognized by cockroach debug zip and cockroach debug merge-logs but may not be compatible with external log collectors. For details on how the markers appear in each format, see Log formats. |
exit-on-error |
When true, stops the Cockroach node if an error is encountered while writing to the sink. We recommend enabling this option on file sinks in order to avoid losing any log entries. When set to false, this can be used to mark certain sinks (such as stderr) as non-critical. |
auditable |
If true, enables exit-on-error on the sink. Also disables buffered-writes if the sink is under file-groups. This guarantees non-repudiability for any logs in the sink, but can incur a performance overhead and higher disk IOPS consumption. This setting is typically enabled for security-related logs. |
Each sink type also features its own set of parameters.
Setup
Let's create a simple, local working environment to play around with the new logging features. This is helpful to familiarize yourself with Channels and Sinks, and validate that the output matches your requirements.
Create file logs.yaml
to store your logging configuration.
Read through it:
- we are only using files as sinks (not fluentd/Splunk)
- we use the JSON format
- the most used channels are mapped to file
cockroach.log
(identified by the keydefault
) - each other channel is mapped to a channel that bears its name, eg
cockroach-storage.log
for channelstorage
- every sink inherits the default parameters, no special params used.
file-defaults:
max-file-size: 10MiB
max-group-size: 100MiB
buffered-writes: true
filter: INFO
format: json
redact: false
redactable: true
exit-on-error: true
auditable: false
fluent-defaults:
filter: INFO
# format: json-fluent # default
redact: false
redactable: true
exit-on-error: false
auditable: false
sinks:
# fluent-servers:
# myhost:
# channels: [DEV, OPS, HEALTH]
# address: 127.0.0.1:5170
# net: tcp
file-groups:
default: # will output to cockroach.log
channels: [DEV, OPS, HEALTH, SQL_SCHEMA, USER_ADMIN, PRIVILEGES]
storage: # will output to cockroach-storage.log
channels: [STORAGE]
sensitive-access:
channels: [SENSITIVE_ACCESS]
sessions:
channels: [SESSIONS]
sql-exec:
channels: [SQL_EXEC]
sql-perf:
channels: [SQL_PERF]
sql-internal-perf:
channels: [SQL_INTERNAL_PERF]
stderr:
channels: all
filter: NONE
format: json
redact: false
redactable: true
exit-on-error: true
capture-stray-errors:
enable: true
max-group-size: 100MiB
Download and install the CockroachDB binary.
Then, start a secure database instance
# From https://www.cockroachlabs.com/docs/v21.1/secure-a-cluster#step-1-generate-certificates
mkdir certs my-safe-directory
cockroach cert create-ca --certs-dir=certs --ca-key=my-safe-directory/ca.key
cockroach cert create-node localhost $(hostname) --certs-dir=certs --ca-key=my-safe-directory/ca.key
# start a secure single node cluster
cockroach start-single-node --certs-dir=certs --background --log-config-file=logs.yaml
# log into the cluster as root
cockroach sql --certs-dir=certs
File output
Good, now open a new Terminal window to inspect the files that were created
$ cd cockroach-data/logs
$ ls -l
total 552
lrwxr-x--- 1 fabio staff 58 Oct 27 15:13 cockroach-stderr.log -> cockroach-stderr.mac.fabio.2021-10-27T19_13_01Z.011252.log
-rw-r----- 1 fabio staff 910 Oct 27 15:13 cockroach-stderr.mac.fabio.2021-10-27T19_13_01Z.011252.log
lrwxr-x--- 1 fabio staff 59 Oct 27 15:13 cockroach-storage.log -> cockroach-storage.mac.fabio.2021-10-27T19_13_01Z.011252.log
-rw-r----- 1 fabio staff 10793 Oct 27 15:44 cockroach-storage.mac.fabio.2021-10-27T19_13_01Z.011252.log
lrwxr-x--- 1 fabio staff 51 Oct 27 15:13 cockroach.log -> cockroach.mac.fabio.2021-10-27T19_13_01Z.011252.log
-rw-r----- 1 fabio staff 236291 Oct 27 16:02 cockroach.mac.fabio.2021-10-27T19_13_01Z.011252.log
# display last line of cockroach.log
$ tail -n1 cockroach.log
{"channel_numeric":2,"channel":"HEALTH","timestamp":"1635365292.961975000","cluster_id":"a9dc27ba-af51-4c8c-b165-69a937b48eb9","node_id":1,"severity_numeric":1,"severity":"INFO","goroutine":20,"file":"server/status/runtime.go","line":569,"entry_counter":478,"redactable":1,"tags":{"n":"1"},"message":"runtime stats: 178 MiB RSS, 274 goroutines (stacks: 3.7 MiB), 47 MiB/85 MiB Go alloc/total (heap fragmentation: 7.2 MiB, heap reserved: 13 MiB, heap released: 58 MiB), 71 MiB/78 MiB CGO alloc/total (0.6 CGO/sec), 1.4/1.1 %(u/s)time, 0.0 %gc (0x), 24 KiB/22 KiB (r/w)net"}
As expected, we see file cockroach.log
, cockroach-storage.log
and cockroach-stderr.log
.
The format is JSON, as configured.
But where are the other log files? We're missing those related to sessions, audit, sql-exec...
Some logging channels needs to be enabled by means of Cluster Settings.
For example, channel SESSIONS
is enabled by these 2 cluster settings, set to False
by default:
-
server.auth_log.sql_connections.enabled
--> if set, log SQL client connect and disconnect events -
server.auth_log.sql_sessions.enabled
--> if set, log SQL session login/disconnection events
Check the Logging Channels for more information on which cluster settings you need to enable.
Now, let's change a simple setting in logs.yaml
to test how logs look like.
In this exercise, we change the log format from json
to crdb-v2
, which is more humanly readable.
# stop the cluster
$ cockroach quit --certs-dir=certs
# on my macbook, sed is aliased to gsed (GNU sed)
$ sed -i 's/json/crdb-v2/g' logs.yaml
# restart the cluster
$ cockroach start-single-node --certs-dir=certs --background --log-config-file=logs.yaml
# check again the last line of cockroach.log
$ tail -n1 cockroach.log
I211027 20:20:40.838139 447 sql/sqlliveness/slinstance/slinstance.go:144 ⋮ [n1] 52 created new SQL liveness session ‹ef0fdfe18b014b42a809f36257c3818b›
As expected, we see the new format.
Let's do one more exercise.
At the SQL prompt, enable the SQL_EXEC
channel
SET CLUSTER SETTING sql.trace.log_statement_execute = True;
and immediately you will see these new files
$ ls -l
total 904
lrwxr-x--- 1 fabio staff 60B Oct 27 16:31 cockroach-sql-exec.log@ -> cockroach-sql-exec.mac.fabio.2021-10-27T20_31_16Z.013087.log
-rw-r----- 1 fabio staff 3.5K Oct 27 16:31 cockroach-sql-exec.mac.fabio.2021-10-27T20_31_16Z.013087.log
[...]
$ tail -n1 cockroach-sql-exec.log
I211027 20:34:11.553235 10838 9@util/log/event_log.go:32 ⋮ [intExec=‹cancel/pause-requested›] 95 ={"Timestamp":1635366851551571000,"EventType":"query_execute","Statement":"‹UPDATE \"\".system.jobs SET status = CASE WHEN status = $1 THEN $2 WHEN status = $3 THEN $4 ELSE status END WHERE (status IN ($1, $3)) AND ((claim_session_id = $5) AND (claim_instance_id = $6)) RETURNING id, status›","Tag":"UPDATE","User":"node","ApplicationName":"$ internal-cancel/pause-requested","PlaceholderValues":["‹'pause-requested'›","‹'paused'›","‹'cancel-requested'›","‹'reverting'›","‹'\\xef0fdfe18b014b42a809f36257c3818b'›","‹1›"],"ExecMode":"exec-internal","Age":1.631}
Very good! Now you have a simple setup to test how logging works.
Fluentd output
Fluentd is an open-source project for data collection. Fluentd allows you to have a unified view of all your logs from all your applications. Many companies use Fluentd, or use products that receive the same protocol.
You can hook up CockroachDB to Fluentd very easily to test how logging is handled over a network.
Create a file fluent.conf
in your home directory
##########
# Inputs #
##########
# this source reads files in a directory
# <source>
# @type tail
# path "/var/log/*.log"
# read_from_head true
# tag gino
# <parse>
# @type json
# time_type string
# time_format "%Y-%m-%d %H:%M:%S.%N"
# # time_type float
# time_key timestamp
# </parse>
# </source>
# this source listens on a network port
<source>
@type tcp
tag CRDB
<parse>
@type json
time_type float
time_key timestamp
</parse>
port 5170
bind 0.0.0.0
delimiter "\n"
</source>
###########
# Outputs #
###########
# output to file buffer
<match **>
@type file
path /output/
append true
<buffer>
timekey 1d
timekey_use_utc true
timekey_wait 1m
</buffer>
<format>
@type out_file
delimiter ","
time_format "%Y-%m-%d %H:%M:%S.%N"
</format>
</match>
Then, start the server using Docker
cd ~
mkdir output
# make sure you map the path accordingly
docker run -ti --rm -v /Users/fabio/fluent.conf:/fluentd/etc/fluent.conf -v /Users/fabio/output:/output -p=5170:5170 fluent/fluentd
If Fluentd starts successfully, you should see this output on stdout:
2021-11-01 16:56:24 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2021-11-01 16:56:24 +0000 [info]: using configuration file: <ROOT>
[...]
2021-11-01 16:56:24 +0000 [info]: adding source type="tcp"
2021-11-01 16:56:24 +0000 [info]: #0 starting fluentd worker pid=17 ppid=7 worker=0
2021-11-01 16:56:24 +0000 [info]: #0 fluentd worker is now running worker=0
Ok, Fluent is correctly listening for incoming messages.
Now, stop CockroachDB
cockroach quit --certs-dir=certs
Uncomment this section in the logs.yaml
file
fluent-servers:
myhost:
channels: [DEV, OPS, HEALTH]
address: 127.0.0.1:5170
net: tcp
Restart CockroachDB
cockroach start-single-node --certs-dir=certs --background --log-config-file=logs.yaml
Check directory output
and tail the buffer log file
$ tail -n1 output/buffer.*.log
2021-11-01 17:03:09.353577800 CRDB {"tag":"cockroach.health","c":2,"t":"1635786189.332476000","x":"92c25c70-6e91-41f6-9480-6b2de3cb729a","N":1,"s":1,"sev":"I","g":325,"f":"server/status/runtime.go","l":569,"n":73,"r":1,"tags":{"n":"1"},"message":"runtime stats: 101 MiB RSS, 271 goroutines (stacks: 3.5 MiB), 27 MiB/58 MiB Go alloc/total (heap fragmentation: 7.4 MiB, heap reserved: 8.9 MiB, heap released: 17 MiB), 3.8 MiB/7.0 MiB CGO alloc/total (0.0 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (0x), 79 KiB/180 KiB (r/w)net"}
Good stuff, you successfully integrated Fluentd with CockroachDB logging framework!