prometheus-engine icon indicating copy to clipboard operation
prometheus-engine copied to clipboard

config-reloader Restarts Multiple Times on Boot

Open silvamerica opened this issue 2 years ago • 3 comments

I've been investigating why the config-reloader pod occasionally restarts multiple times while booting, and I've narrowed it down to this line. I'm seeing the following in the logs:

Get \"http://localhost:19090/-/ready\": dial tcp 127.0.0.1:19090: connect: connection refused

Per the comment on line 73, it seems like the intent would be to continue polling in this situation, but instead the process exits and has to be restarted.

silvamerica avatar May 11 '23 22:05 silvamerica

Hi @silvamerica,

Thanks for pointing this out!

The thought behind not tolerating the error was that it would be a non-recoverable issue with the http client or something. However, as it turns out, that's not the case - as you have highlighted.

Open PR at #474.

pintohutch avatar May 12 '23 17:05 pintohutch

@pintohutch to clarify, will we lose any metrics being collected on that node if we encounter this crash? Or is this just isolated to the config-reloader sidecar?

mathe-matician avatar Dec 27 '23 14:12 mathe-matician

Hey @mathe-matician - you will not lose metrics. This is just the config-reloader restarting because prometheus isn't ready to load config yet (presumably due to startup delay).

pintohutch avatar Jan 04 '24 14:01 pintohutch