UTF-8 support in metric and label names
Adds UTF-8 support for metric and label names.
These changes are based on the work done on the Prometheus common libraries here and here
- The
prometheus-metrics-exposition-formatsmodule will use the new quoting syntax{"foo"}iff the metric does not conform to the legacy name format (foo{}) - The
prometheus-metrics-modelmodule has a new flag (NameValidationScheme) that determines if validation is done using the legacy or the UTF-8 scheme. This flag can be set via a property in the properties file. - Scrapers can announce via content negotiation that they support UTF-8 names by adding
escaping=allow-utf-8in the Accept header. In cases where UTF-8 is not available, metric providers can be configured to escape names in a few different ways: values (U__UTF value escaping for perfect round-tripping), underscores (all invalid chars become_), dots (dots become_dot_,_becomes__, all other values become___). Escaping can either be a global default (PrometheusNaming.nameEscapingScheme) or can also be specified in Accept header with theescaping=term, which can beallow-utf-8(for UTF-8-compatible),underscores,dots, orvalues. This should still be a noop for existing configurations because scrapers will not be passing the escaping key in the Accept header. Existing functionality is maintained. - The
prometheus-metrics-exporter-pushgatewaymodule will escape UTF-8 grouping keys in the URL path used when pushing metrics (see https://github.com/prometheus/pushgateway/pull/689)
Work towards https://github.com/prometheus/prometheus/issues/13095
Given the ongoing discussion about unit suffixes for OM 2.0 (https://github.com/prometheus/OpenMetrics/issues/286), I think we can take this UTF-8 work as a basis and then add the necessary changes to comply with the final consensus on suffixes.
Given the ongoing discussion about unit suffixes for OM 2.0 (prometheus/OpenMetrics#286), I think we can take this UTF-8 work as a basis and then add the necessary changes to comply with the final consensus on suffixes.
@fstab are you ok with that?
Update: The client_java maintainers just decided that we'll wait 6 more weeks, until 1 April 2025. If we have OpenMetrics 2.0 by then we will implement that. If OpenMetrics 2.0 is still under discussion in 6 weeks we will merge this PR.
@fstab Good to know, thanks for the update!
Hello @fstab, I just wanted to follow-up on your last comment. Seems like OM 2.0 is still under discussion, so do you think now is a good time to reconsider merging this PR?
We have our client_java community call tomorrow, and can discuss this there. If you have time, feel free to join. See the public Prometheus calendar linked here: https://prometheus.io/community/
@fedetorres93 thanks for the PR!
let me start with some high level questions before an in-depth review:
- I think we need a setting for https://prometheus.io/docs/guides/utf8/#otlp-metrics in
ExporterOpenTelemetryProperties - the statics in
PrometheusNamingshould be final - e.g. the scrape handler should pass the escaping as an argument
@zeitlinger Thanks for the feedback. I made the statics in PrometheusNaming final as you suggested.
About adding a setting in ExporterOpenTelemetryProperties, IIUC that module is translating from Prometheus to OTel format, so it should continue working as it is now. Prometheus' UTF-8 configs affect translations from OTel to Prometheus.
PrometheusNaming.nameEscapingScheme
I can't find that
for formatting, checkstyle issues see CONTRIBUTING.md
@zeitlinger Thank you very much for the feedback.
I've addressed your comments and updated the original PR description to reflect the current variable names, sorry for any confusion caused during your review.
From review with @fstab
Conceptually
- we should consider getting rid of the validation scheme: the java client is mostly used indirectly, e.g. from spring, JMX exporter, OTel SDK - where it's strange to have to pass a sys property to unlock UTF-8 chars in metric names and labels
- sanitizeMetricName should not replace UTF-8 chars anymore
-
allow-utf-8escaping in text formats must still escape some characters like whitespace, newlines,{- this is currently not done
Minor issues
-
legacyis an ambiguous name - better to havenonUtf8or similar - move escaping related methods (
escapeMetricSnapshot) to separate utils class
Already fixed
- Promethus name check now uses
underscoresescaping scheme
legacyis an ambiguous name - better to havenonUtf8or similar
"legacy" is the term used in upstream prometheus libraries, for better or worse. It's short for "legacy valid prometheus character set". "nonutf8" doesn't quite capture the idea either, because "." is not really a "UTF-8" character.
allow-utf-8 escaping in text formats must still escape some characters like whitespace, newlines, { - this is currently not done
@zeitlinger AFAIU only backslashes (\) newlines (\n) and double quotes (") should be escaped, other UTF-8 characters are valid with the new quoting syntax.
allow-utf-8 escaping in text formats must still escape some characters like whitespace, newlines, { - this is currently not done
@zeitlinger AFAIU only backslashes (
\) newlines (\n) and double quotes (") should be escaped, other UTF-8 characters are valid with the new quoting syntax.
yes, this is also the outcome of the meeting yesterday :smile: