Ignition - Remote Logging for Console-less cloud servers
Issue Report
Feature Request
Enhance Ignition to remotely log failures to aid in debugging failures encountered during boot.
Environment
AWS EC2 Servers which do NOT have a console and often do not have content in system logs available. Console scrollback is also not available.
Desired Feature
Add configuration option in ignition to remotely log / offer logs - For example: https://www.freedesktop.org/software/systemd/man/systemd-journal-gatewayd.html
Other Information
Because it is important that ignition doesn't mask failures, we must lockup CoreOS and not allow SSH in for debugging (as that may be a non-obvious failure).
However, in AWS, we can't drop down to console to execute journalctl commands and thus failure logs are also masked.
Instead we could make journals remotely available by streaming them to an external journal collector or opening up a port to request journals for debugging.
An example of a failure could be where the ignition configuration is syntactically correct, but requests a non-existing drive/network interface and thus fails. Perhaps trying to format a drive label that doesn't exist, or a device path that is different on a new type of cloud server. These would all pass validation but fail to boot leaving the sys-admin completely in the dark.
only thing I can think of here would be to have ignition try to mount a boot partition and store it's logs there on a failure event. Then you could mount the disk on another instance and see it.
But really, I feel like viewing the serial console log should show the failure (assuming there exists a console=ttyS0 kernel cmdline arg).
I had opened a similar issue long ago (I cannot find it now). My intended use case was to support bare metal machines that don't have lights-out management. Writing the logs to a disk isn't always possible though (Container Linux supports disk-less boot), so the idea was to stream the logs elsewhere.
I'm curious about your environment, @pctj101. I didn't know AWS let you create instances without a console log. Which instance type are you using?
I'll take another look tomorrow.
Perhaps I wasn't waiting long enough for the system logs to show up. Typically so far, after boot my console log is completely blank. If if takes several minutes for AWS console logs to catch up, perhaps we should log to something a bit more... 'real-time'.
I did find that AWS takes 4 minutes to propagate console logs to the web ui. I'm not sure if there's a better way without making ignition too overweight and bulky. What do you think?
Now that I've verified logs appear, this kind of falls solidly into the "nice to have" bucket (rather than "need to have")
Ignition would need to learn to save it's own logs (i.e. keep its own internal copy in addition to what is logged to the journal). It could then POST its logs and something with success or failure to a provided URL (in the ignition section) in the case of either. We do something similar on packet which has a timeline you can post custom events to. I don't think that AWS has something similar. Thoughts?
That's a cool thought. On AWS "cloudwatch" is a reasonable place to POST data to since that's where AWS seems to send all logs to by default. Serverless lambda functions for example log data to cloudwatch and it's "almost real-time" (much less lag than console logs at least).
I don't know the details, but perhaps it's something along this idea: https://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_PutLogEvents.html