prometheus-boshrelease FirehoseExporterLastEnvelopeReceivedTooOld

We are using Prometheus 23.3.0 in our environment. We started receiving the below alerts for the past two days.

The firehose_exporter at prod/10.x.x.x:9186 last Envelope received was more than 10m 0s ago.

Kindly let us know what causing this issue and how we can fix this one?

Nov 19 '19 11:11 Muruganfly03

Hi @Muruganfly03, did you check the logs from the firehose_exporter? Sometimes a restart of the firehose exporter solves that issue.

Nov 19 '19 11:11 benjaminguttmann-avtq

Hi Benjam

I login into firehose_exporter VM and found there is no error reported for past two weeks. Process looks good and data collection happens.

firehose/xxxxxxxxx:/var/vcap/sys/log/firehose_exporter# ls -lrt total 16 -rw-r--r-- 1 root root 0 Aug 13 09:19 firehose_exporter.stdout.log -rw-r--r-- 1 root root 263 Oct 23 09:46 firehose_exporter_ctl.stdout.log -rw-r--r-- 1 root root 91 Oct 23 09:46 firehose_exporter_ctl.stderr.log -rw-r--r-- 1 root root 5549 Nov 5 06:36 firehose_exporter.stderr.log firehose/xxxxxxxxxxxxxx:/var/vcap/sys/log/firehose_exporter# date Tue Nov 19 11:05:39 UTC 2019 firehose/xxxxxxxxx:/var/vcap/sys/log/firehose_exporter# monit summary The Monit daemon 5.2.5 uptime: 27d 1h 19m

Process 'firehose_exporter' running Process 'bosh-dns' running Process 'bosh-dns-healthcheck' running Process 'fim' running System 'system_localhost' running firehose/xxxxxxxxxxxxx:/var/vcap/sys/log/firehose_exporter# monit status The Monit daemon 5.2.5 uptime: 27d 1h 19m

Process 'firehose_exporter' status running monitoring status monitored pid 8201 parent pid 1 uptime 27d 1h 19m children 0 memory kilobytes 44976 memory kilobytes total 44976 memory percent 1.1% memory percent total 1.1% cpu percent 0.0% cpu percent total 0.0% data collected Tue Nov 19 11:05:49 2019

Sometimes a restart of the firehose exporter solves that issue.

Recycle the process is enough in this scenerio?

Nov 19 '19 11:11 Muruganfly03

Restarting with the following commands:

sudo su monit restart firehose_exporter

Nov 19 '19 11:11 benjaminguttmann-avtq

Hi Benjamin

Any idea why we receive this kind of alerts? Since data collection happens on firehose_exporter without any issues.

Nov 19 '19 11:11 Muruganfly03

Hi Benjamin

We received similar issue in one more deployments. Kindly let me know what causing this issue?

Firehose_exporter failed to update after the below error message.

-rw-r--r-- 1 root root 50951247 Nov 20 12:57 firehose_exporter.stderr.log firehose/dxxxxxxxxxxxxxxx:/var/vcap/sys/log/firehose_exporter# date Thu Nov 21 04:41:34 UTC 2019

Error Message: time="2019-09-24T10:31:56Z" level=error msg="Value Metric from `MetronAgent` discarded: label value \"c3a7d66e-faa3-4d33-99d1-d3d6\\x00\\x00\\x00\\x00\\xc4\\n\\xc1\\ x01\" is not valid UTF-8" source="value_metrics_collector.go:60" time="2019-09-24T10:32:57Z" level=error msg="Value Metric from MetronAgent discarded: label value "c3a7d66e-faa3-4d33-99d1-d3d6\x00\x00\x00\x00\xc4\n\xc1\ x01" is not valid UTF-8" source="value_metrics_collector.go:60" time="2019-09-27T07:32:24Z" level=error msg="Error while reading from the Firehose: read tcp 10.xxxx:38620->10.181.36.135:443: read: connection reset by peer" sour ce="firehose_nozzle.go:121" time="2019-09-30T18:52:27Z" level=error msg="Error while reading from the Firehose: read tcp 10.xxxx:55164->10.181.36.135:443: read: connection reset by peer" sour ce="firehose_nozzle.go:121" time="2019-10-05T11:13:02Z" level=error msg="Error while reading from the Firehose: read tcp 10.xxxx:56936->10.181.36.135:443: read: connection reset by peer" sour ce="firehose_nozzle.go:121"

Nov 21 '19 04:11 Muruganfly03

Closed due to inactivity; Please feel free to re-open if issue still persists

Jan 04 '23 09:01 benjaminguttmann-avtq