Exporter panics on invalid/unexpected metric names
The results from this issue were broken up into several issues. The part that remains here is:
- graphite_exporter should implement a counter of rejected metrics and expose it as parts of its own metrics.
- graphite_exporter should not panic and exit when an invalid metric is received.
- Graphite_exporter in debug mode should log the offending metrics in a log file (or in the debug output of the web page for simplicity sake)
Original investigation:
==========
The majority of our metrics contain the field _ in the various parts of the metric.
labels.go has a validation function, validateLabelValues, that checks if the expected number of labels is consistent with the original number of dot delimited fields.
with a metric like: hostname_function.source.description.metric_blah.count 2.0 timestamp
The exporter will get confused between hostname_function and hostname function. Due to the splitting the underscores. _ is a valid Graphite AND Prometheus value and should be handled. UPDATE:When using matching this problem can be bypassed by assigning parts of the graphite metric name to labels and extracting a good metric name. See comment below. Issue is still valid as it should not panic when encountering an unexpected/unsupported/supported_but_problematic metric name.
~~Suggested fix: A temporary token should be used to treat initial underscores detected and restoring them when creating the label names to be exposed to Prometheus.~~
0.5 Panics 0.4.2 Drops the offending metrics and only exposes the "valid" metrics. But it considers the graphite_Exporter as Down and throws errors in the syslog metric after HELP is INVALID.
func validateLabelValues(vals []string, expectedNumberOfValues int) error {
if len(vals) != expectedNumberOfValues {
return fmt.Errorf(
"%s: expected %d label values but got %d in %#v",
errInconsistentCardinality, expectedNumberOfValues,
len(vals), vals,
)
}
for _, val := range vals {
if !utf8.ValidString(val) {
return fmt.Errorf("label value %q is not valid UTF-8", val)
}
}
return nil
}
After further testing here are my conclusions and suggestions:
- graphite_exporter should implement a counter of rejected metrics and expose it as parts of its own metrics.
- graphite_exporter should not panic and exit when an invalid metric is received.
- Graphite_exporter in debug mode should log the offending metrics in a log file (or in the debug output of the web page for simplicity sake)
- Documentation should explain that any underscores in the received graphite formatted metric will be rejected UNLESS there is a regex or glob match to extract the various fields and build metric names that are valid and will not be rejected due to various internet validations.
- Provide examples using regex not just glob in the documentation
- Provide an example using a catch-all regex for metrics not matching an existing regex.
- Explain the difference between using the configuration for dropping all non match metrics versus using a catch all regex to provide an indication that metrics are not matching and would have been dropped.
I agree with all of these points! In particular,
Documentation should explain that any underscores in the received graphite formatted metric will be rejected
would be a stop-gap measure to document a known issue – if underscores are valid in Graphite, then we need to handle them.
Thank you for the thorough investigation and write-up. I broke out your list in 3 separate issues:
- graceful handling of unexpected input (this issue, renamed)
- handling underscores (#81)
- improving the documentation (#82)
Help with any of these is highly appreciated!
My pleasure. I will try and improve the docs with my suggestions. (When I get a bit of free time)