graphite_exporter icon indicating copy to clipboard operation
graphite_exporter copied to clipboard

Exporter panics on invalid/unexpected metric names

Open xkilian opened this issue 6 years ago • 4 comments

The results from this issue were broken up into several issues. The part that remains here is:

  • graphite_exporter should implement a counter of rejected metrics and expose it as parts of its own metrics.
  • graphite_exporter should not panic and exit when an invalid metric is received.
  • Graphite_exporter in debug mode should log the offending metrics in a log file (or in the debug output of the web page for simplicity sake)

Original investigation:

==========

The majority of our metrics contain the field _ in the various parts of the metric.

labels.go has a validation function, validateLabelValues, that checks if the expected number of labels is consistent with the original number of dot delimited fields.

with a metric like: hostname_function.source.description.metric_blah.count 2.0 timestamp

The exporter will get confused between hostname_function and hostname function. Due to the splitting the underscores. _ is a valid Graphite AND Prometheus value and should be handled. UPDATE:When using matching this problem can be bypassed by assigning parts of the graphite metric name to labels and extracting a good metric name. See comment below. Issue is still valid as it should not panic when encountering an unexpected/unsupported/supported_but_problematic metric name.

~~Suggested fix: A temporary token should be used to treat initial underscores detected and restoring them when creating the label names to be exposed to Prometheus.~~

0.5 Panics 0.4.2 Drops the offending metrics and only exposes the "valid" metrics. But it considers the graphite_Exporter as Down and throws errors in the syslog metric after HELP is INVALID.

func validateLabelValues(vals []string, expectedNumberOfValues int) error {
	if len(vals) != expectedNumberOfValues {
		return fmt.Errorf(
			"%s: expected %d label values but got %d in %#v",
			errInconsistentCardinality, expectedNumberOfValues,
			len(vals), vals,
		)
	}

	for _, val := range vals {
		if !utf8.ValidString(val) {
			return fmt.Errorf("label value %q is not valid UTF-8", val)
		}
	}

	return nil
}

xkilian avatar Mar 15 '19 22:03 xkilian

After further testing here are my conclusions and suggestions:

  • graphite_exporter should implement a counter of rejected metrics and expose it as parts of its own metrics.
  • graphite_exporter should not panic and exit when an invalid metric is received.
  • Graphite_exporter in debug mode should log the offending metrics in a log file (or in the debug output of the web page for simplicity sake)
  • Documentation should explain that any underscores in the received graphite formatted metric will be rejected UNLESS there is a regex or glob match to extract the various fields and build metric names that are valid and will not be rejected due to various internet validations.
  • Provide examples using regex not just glob in the documentation
  • Provide an example using a catch-all regex for metrics not matching an existing regex.
  • Explain the difference between using the configuration for dropping all non match metrics versus using a catch all regex to provide an indication that metrics are not matching and would have been dropped.

xkilian avatar Apr 04 '19 16:04 xkilian

I agree with all of these points! In particular,

Documentation should explain that any underscores in the received graphite formatted metric will be rejected

would be a stop-gap measure to document a known issue – if underscores are valid in Graphite, then we need to handle them.

matthiasr avatar Apr 08 '19 12:04 matthiasr

Thank you for the thorough investigation and write-up. I broke out your list in 3 separate issues:

  • graceful handling of unexpected input (this issue, renamed)
  • handling underscores (#81)
  • improving the documentation (#82)

Help with any of these is highly appreciated!

matthiasr avatar Apr 08 '19 12:04 matthiasr

My pleasure. I will try and improve the docs with my suggestions. (When I get a bit of free time)

xkilian avatar Apr 09 '19 00:04 xkilian