haproxy_exporter icon indicating copy to clipboard operation
haproxy_exporter copied to clipboard

MAINT status should not return 0

Open jzielke84 opened this issue 8 years ago • 16 comments

When a frontend or backend is set to maintenance mode, is is down on purpose and should not return a fail state to prometheus/grafana by returning 0. Instead I suggest a returncode of 2 so grafana can give the status a special color in case a loadbalancer is equipped with such a setting.

We've tried changing the return value in the code but that didn't seem to do the trick.

jzielke84 avatar Dec 12 '17 10:12 jzielke84

In order to avoid complexity to understand the different code status, I would propose to rename the up metric to something like:

status{status="up", backend="my_backend_name"} 42
status{status="down", backend="my_backend_name"} 5
status{status="maint", backend="my_backend_name"} 18

WDYT @grobie?

wdauchy avatar Dec 13 '17 13:12 wdauchy

Strongly advise this change

jzielke84 avatar Dec 19 '17 09:12 jzielke84

waiting for @grobie acknowledgment before starting any patch

wdauchy avatar Dec 19 '17 16:12 wdauchy

I wouldn't change the current up metric, it's being used in many dashboards and alert expressions. I'd be fine adding an additional metric, but I'm a bit worried about the label cardinality. I'm counting at least 9 different status values. Maybe only add that metric to backend and frontend lines?

grobie avatar Dec 19 '17 17:12 grobie

@grobie the MAINT status is only avaiable in the backend. At least the status MAINT shouldn't return 0 because a planned maitenance by definition is not an error nor an unwanted condition. Normally the status page of haproxy only has the status UP,DOWN,MAINT and DRAIN (see: Link).

Also drain is a forced action which is wanted by the administrators of the proxy and thus should not be considered as a failure either. So generally spoken, UP and DOWN are conditions which could be met without any action (e.g. in an error case) while DRAIN and MAINT are forced actions which should be treated differently. Hence my suggestion to return 2 and expand the haproxy metrics.

jzielke84 avatar Dec 20 '17 10:12 jzielke84

ok I can add what I proposed

wdauchy avatar Dec 20 '17 18:12 wdauchy

The up metric is a common pattern in Prometheus and is a boolean value with the values 0 or 1. An instance / a server which can't serve requests is not up, whether it's not up because of errors or maintenance is not being answered by this metric. If that destinction is relevant to users, I'm happy to accept a PR which will add a new metric broken down by status type.

grobie avatar Dec 21 '17 15:12 grobie

I would very much like the status to be exposed, particularly per server, few thoughts (partly echoing what has already been said):

  1. "returncode of 2" would be meaningless once you start summing across metrics.
  2. Not (numerically) counting MAINT as down is dangerous. Let's say I alert when I have less than 5 servers ready. I normally have 10 servers in the backend, I take 2 down for maintenance (MAINT), then 4 of the remaining servers fail. If MAINT isn't being counted as down I will not get an alert.

I would support leaving the current up metric as is, and creating a new metric e.g:

haproxy_server_status{status="MAINT",backend="foo",instance="bar",job="haproxy",server="server1"} 1

Transitional statuses could perhaps be normalised, e.g "UP 1/3" and "UP 2/3" become "UP".

Tom-Fawcett avatar Dec 30 '17 21:12 Tom-Fawcett

So rather than changing the metric, adding a new attribute is the right way to go so users can decide whether to work with that new information being parsed or not without breaking anything they've created so far.

Unfortunately I'm not familiar with the go-syntax and although I understand parts of it I think it's better someone does the pull request who knows what he's doing. Th suggestion of @Tom-Fawcett seems to point in the right direction.

jzielke84 avatar Jan 02 '18 11:01 jzielke84

Any plans on implementing this one in the near future?

jzielke84 avatar Feb 16 '18 12:02 jzielke84

@jzielke-nli I have opened #101. Depending on feedback it may require some additional work.

Tom-Fawcett avatar Feb 26 '18 19:02 Tom-Fawcett

@grobie Please be so kind and commit this change if ok.

jzielke84 avatar Apr 27 '18 14:04 jzielke84

With MAINT of a server being configured as a new metric, in this threads example, how would that work with Grafana where you have a single panel for status of a server?

Shadow00Caster avatar May 18 '18 21:05 Shadow00Caster

Any update on this? @grobie

jnogol avatar Aug 29 '18 06:08 jnogol

Dead end here?

jzielke84 avatar Aug 02 '19 08:08 jzielke84

Any update on this one ?

ekm1908 avatar Nov 19 '19 09:11 ekm1908

Hi, I am closing this issue because we are retiring this exporter. We will not be implementing new features anymore.

Please use the Prometheus support in HAProxy directly. It may already support this; if not, please open an issue against the HAProxy repository.

matthiasr avatar Feb 15 '23 10:02 matthiasr