bugs icon indicating copy to clipboard operation
bugs copied to clipboard

Ignition - Remote Logging for Console-less cloud servers

Open pctj101 opened this issue 7 years ago • 7 comments

Issue Report

Feature Request

Enhance Ignition to remotely log failures to aid in debugging failures encountered during boot.

Environment

AWS EC2 Servers which do NOT have a console and often do not have content in system logs available. Console scrollback is also not available.

Desired Feature

Add configuration option in ignition to remotely log / offer logs - For example: https://www.freedesktop.org/software/systemd/man/systemd-journal-gatewayd.html

Other Information

Because it is important that ignition doesn't mask failures, we must lockup CoreOS and not allow SSH in for debugging (as that may be a non-obvious failure).

However, in AWS, we can't drop down to console to execute journalctl commands and thus failure logs are also masked.

Instead we could make journals remotely available by streaming them to an external journal collector or opening up a port to request journals for debugging.

An example of a failure could be where the ignition configuration is syntactically correct, but requests a non-existing drive/network interface and thus fails. Perhaps trying to format a drive label that doesn't exist, or a device path that is different on a new type of cloud server. These would all pass validation but fail to boot leaving the sys-admin completely in the dark.

pctj101 avatar Oct 19 '18 06:10 pctj101

only thing I can think of here would be to have ignition try to mount a boot partition and store it's logs there on a failure event. Then you could mount the disk on another instance and see it.

But really, I feel like viewing the serial console log should show the failure (assuming there exists a console=ttyS0 kernel cmdline arg).

dustymabe avatar Oct 19 '18 13:10 dustymabe

I had opened a similar issue long ago (I cannot find it now). My intended use case was to support bare metal machines that don't have lights-out management. Writing the logs to a disk isn't always possible though (Container Linux supports disk-less boot), so the idea was to stream the logs elsewhere.

I'm curious about your environment, @pctj101. I didn't know AWS let you create instances without a console log. Which instance type are you using?

crawford avatar Oct 19 '18 18:10 crawford

I'll take another look tomorrow.

Perhaps I wasn't waiting long enough for the system logs to show up. Typically so far, after boot my console log is completely blank. If if takes several minutes for AWS console logs to catch up, perhaps we should log to something a bit more... 'real-time'.

pctj101 avatar Oct 21 '18 11:10 pctj101

I did find that AWS takes 4 minutes to propagate console logs to the web ui. I'm not sure if there's a better way without making ignition too overweight and bulky. What do you think?

Now that I've verified logs appear, this kind of falls solidly into the "nice to have" bucket (rather than "need to have")

pctj101 avatar Oct 22 '18 10:10 pctj101

Ignition would need to learn to save it's own logs (i.e. keep its own internal copy in addition to what is logged to the journal). It could then POST its logs and something with success or failure to a provided URL (in the ignition section) in the case of either. We do something similar on packet which has a timeline you can post custom events to. I don't think that AWS has something similar. Thoughts?

ajeddeloh avatar Oct 22 '18 17:10 ajeddeloh

That's a cool thought. On AWS "cloudwatch" is a reasonable place to POST data to since that's where AWS seems to send all logs to by default. Serverless lambda functions for example log data to cloudwatch and it's "almost real-time" (much less lag than console logs at least).

pctj101 avatar Oct 23 '18 05:10 pctj101

I don't know the details, but perhaps it's something along this idea: https://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_PutLogEvents.html

pctj101 avatar Oct 23 '18 06:10 pctj101