Boulder should allow logging to a TCP socket
When we run Boulder within chroot environments, whether a podman container or something using libcontainer, we're not going to have access to /dev/log, and providing it within the chroot is A Whole Ordeal. While it's not great (tm) to send logs hitting the network stack, even localhost-only, it's probably inevitable if we want to further constrain Boulder's operating environment to assume there is no UNIX socket available for logging.
https://github.com/letsencrypt/boulder/blob/a2ff222fdadc2414ab592c0ad96ae15738119175/log/log.go#L59-L66
A couple of alternatives:
- The container ecosystem seems to prefer collecting logs on stdout/stderr. Maybe we should switch to that style of logs collection? Boulder is already happy to emit logs on stderr. It uses a slightly bespoke prefix (
I203959/W204020for info and warnings, respectively) but we could easily adapt that for easy ingestion by whatever system we want to use to forward logs (presumably still rsyslog). - For the boulder docker-compose dev environment, we run rsyslog inside the container. It's not too bad. Is that an option here?
Both are possible. In degenerate cases where the logserver is having a hiccup, rsyslog consumes pretty considerable amounts of RAM in its queuing, and getting the resource management for that correct on a big worker node doesn't sound fun, but it's doable.
I look at sidecars like that in the same vein as ProxySQL - I probably don't want ProxySQL to be a sidecar, I'd rather it be a local service that every Boulder on the worker uses. It, too, won't be able to communicate anymore directly via a socket, which is unfortunate. Is it the best use of resources to sidecar it?
Stdout/stderr works fine, we could certainly just do that, but it limits Boulder's means to detect logging failures.
There's no specific timeline on this, but Boulder panics coming up in a chroot environment at present, and one of the options here will be needed to lock it down further, someday.
Regarding "lacking the means to detect logging failures": maybe this is the moment to return to the idea of making our audit log events not actual "logs", and instead writing them to some persistent and replicated data store with better semantics than the file system.
I am going to open a ticket in the Epic to figure this out, and I'm fine with closing this one No Action.