prometheus-boshrelease icon indicating copy to clipboard operation
prometheus-boshrelease copied to clipboard

FirehoseExporterLastEnvelopeReceivedTooOld

Open Muruganfly03 opened this issue 6 years ago • 5 comments

We are using Prometheus 23.3.0 in our environment. We started receiving the below alerts for the past two days.

The firehose_exporter at prod/10.x.x.x:9186 last Envelope received was more than 10m 0s ago.

Kindly let us know what causing this issue and how we can fix this one?

Muruganfly03 avatar Nov 19 '19 11:11 Muruganfly03

Hi @Muruganfly03, did you check the logs from the firehose_exporter? Sometimes a restart of the firehose exporter solves that issue.

benjaminguttmann-avtq avatar Nov 19 '19 11:11 benjaminguttmann-avtq

Hi Benjam

I login into firehose_exporter VM and found there is no error reported for past two weeks. Process looks good and data collection happens.

firehose/xxxxxxxxx:/var/vcap/sys/log/firehose_exporter# ls -lrt total 16 -rw-r--r-- 1 root root 0 Aug 13 09:19 firehose_exporter.stdout.log -rw-r--r-- 1 root root 263 Oct 23 09:46 firehose_exporter_ctl.stdout.log -rw-r--r-- 1 root root 91 Oct 23 09:46 firehose_exporter_ctl.stderr.log -rw-r--r-- 1 root root 5549 Nov 5 06:36 firehose_exporter.stderr.log firehose/xxxxxxxxxxxxxx:/var/vcap/sys/log/firehose_exporter# date Tue Nov 19 11:05:39 UTC 2019 firehose/xxxxxxxxx:/var/vcap/sys/log/firehose_exporter# monit summary The Monit daemon 5.2.5 uptime: 27d 1h 19m

Process 'firehose_exporter' running Process 'bosh-dns' running Process 'bosh-dns-healthcheck' running Process 'fim' running System 'system_localhost' running firehose/xxxxxxxxxxxxx:/var/vcap/sys/log/firehose_exporter# monit status The Monit daemon 5.2.5 uptime: 27d 1h 19m

Process 'firehose_exporter' status running monitoring status monitored pid 8201 parent pid 1 uptime 27d 1h 19m children 0 memory kilobytes 44976 memory kilobytes total 44976 memory percent 1.1% memory percent total 1.1% cpu percent 0.0% cpu percent total 0.0% data collected Tue Nov 19 11:05:49 2019

Sometimes a restart of the firehose exporter solves that issue.

Recycle the process is enough in this scenerio?

Muruganfly03 avatar Nov 19 '19 11:11 Muruganfly03

Restarting with the following commands:

sudo su monit restart firehose_exporter

benjaminguttmann-avtq avatar Nov 19 '19 11:11 benjaminguttmann-avtq

Hi Benjamin

Any idea why we receive this kind of alerts? Since data collection happens on firehose_exporter without any issues.

Muruganfly03 avatar Nov 19 '19 11:11 Muruganfly03

Hi Benjamin

We received similar issue in one more deployments. Kindly let me know what causing this issue?

Firehose_exporter failed to update after the below error message.

-rw-r--r-- 1 root root 50951247 Nov 20 12:57 firehose_exporter.stderr.log firehose/dxxxxxxxxxxxxxxx:/var/vcap/sys/log/firehose_exporter# date Thu Nov 21 04:41:34 UTC 2019

Error Message: time="2019-09-24T10:31:56Z" level=error msg="Value Metric from `MetronAgent` discarded: label value \"c3a7d66e-faa3-4d33-99d1-d3d6\\x00\\x00\\x00\\x00\\xc4\\n\\xc1\\ x01\" is not valid UTF-8" source="value_metrics_collector.go:60" time="2019-09-24T10:32:57Z" level=error msg="Value Metric from MetronAgent discarded: label value "c3a7d66e-faa3-4d33-99d1-d3d6\x00\x00\x00\x00\xc4\n\xc1\ x01" is not valid UTF-8" source="value_metrics_collector.go:60" time="2019-09-27T07:32:24Z" level=error msg="Error while reading from the Firehose: read tcp 10.xxxx:38620->10.181.36.135:443: read: connection reset by peer" sour ce="firehose_nozzle.go:121" time="2019-09-30T18:52:27Z" level=error msg="Error while reading from the Firehose: read tcp 10.xxxx:55164->10.181.36.135:443: read: connection reset by peer" sour ce="firehose_nozzle.go:121" time="2019-10-05T11:13:02Z" level=error msg="Error while reading from the Firehose: read tcp 10.xxxx:56936->10.181.36.135:443: read: connection reset by peer" sour ce="firehose_nozzle.go:121"

Muruganfly03 avatar Nov 21 '19 04:11 Muruganfly03

Closed due to inactivity; Please feel free to re-open if issue still persists

benjaminguttmann-avtq avatar Jan 04 '23 09:01 benjaminguttmann-avtq