client_java icon indicating copy to clipboard operation
client_java copied to clipboard

: does not get converted to _

Open jonatan-ivanov opened this issue 4 months ago • 11 comments

From @shakuzen in https://github.com/micrometer-metrics/micrometer/pull/6644#issuecomment-3203934641:

it seems : does not get converted to _ in 1.4.1 but it did in 1.3.10. Is that change intentional?

This breaks the Micrometer build, see https://github.com/micrometer-metrics/micrometer/pull/6644 and https://github.com/micrometer-metrics/micrometer/pull/6644#issuecomment-3203934641 for more details and reproducing tests.

jonatan-ivanov avatar Sep 02 '25 20:09 jonatan-ivanov

Yes, this is intended - the behavior is described in https://prometheus.github.io/client_java/exporters/unicode/

zeitlinger avatar Sep 04 '25 07:09 zeitlinger

Thanks for the confirmation. It's at odds with what I expected based on the following in the 1.4.0 release notes (added emphasis is mine):

UTF-8 is now supported for metric and label names!

  • this is a backwards compatible change
    • only Prometheus servers that can handle UTF-8 will actually see it
    • otherwise you'll still get the same _ replacement characters as before

Am I misunderstanding something?

shakuzen avatar Sep 04 '25 07:09 shakuzen

Am I misunderstanding something?

it could have been phrased better 😄

otherwise you'll still get the same

"you" is the end user - meaning that in the client, you'll not see the _ conversion any more

zeitlinger avatar Sep 04 '25 08:09 zeitlinger

Micrometer is a bit of a special use case in that it is using things directly that most users probably don't. We're depending on PrometheusNaming and specifically the behavior of the methods sanitizeMetricName and sanitizeLabelName. Here is a unit test that passes with 1.3.10 and fails with 1.4.1:

@Test
void prometheusNaming() {
    assertThat(PrometheusNaming.sanitizeMetricName("te:st")).isEqualTo("te_st");
    assertThat(PrometheusNaming.sanitizeLabelName("te:st")).isEqualTo("te_st");
}

Looking at these changes, I can see the methods replaceIllegalCharsInMetricName and replaceIllegalCharsInLabelName were removed which explains the behavior difference. Most users I suspect will not notice this difference in behavior because they aren't using PrometheusNaming methods themselves.

Perhaps the behavior of PrometheusNaming.sanitizeMetricName was wrong before (escaped more than it should have) because I see even with 1.3.10 there seems to be no escaping when making a metric name with : in it using the Prometheus client, e.g.:

PrometheusRegistry prometheusRegistry = new PrometheusRegistry();
Counter counter = Counter.builder().name("te:st").labelNames("test").register(prometheusRegistry);
counter.labelValues("te:st").inc();
ExpositionFormats.init().getPrometheusTextFormatWriter().write(System.out, prometheusRegistry.scrape());

This will print

# TYPE te:st_total counter
te:st_total{test="te:st"} 1.0

For now, I've updated our tests to match the new behavior so that we could upgrade versions of the Prometheus client for our 1.16.0-M3 release, but I'm hoping to figure out if I should document the difference in behavior or if our use of PrometheusNaming isn't appropriate for what we're doing: sanitizing Micrometer meter names and tag names for use with the Prometheus client.

shakuzen avatar Sep 10 '25 08:09 shakuzen

Perhaps the behavior of PrometheusNaming.sanitizeMetricName was wrong before

sanitizeMetricName is meant to return a string that won't throw an exception when you pass it to the prom client.

  • Old: % was illegal - and thus replaced by _
  • New: % is now an allowed char, so it's not replaced any more

The end user will only see % when they allow it at scrape time - see https://prometheus.github.io/client_java/exporters/unicode/

what about :?

: is the only char that was replaced by the sanitizer, even though it was actually a valid char before - see https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels

For that reason, you get te:st_total{test="te:st"} 1.0 even though you didn't supply an escaping scheme to the write method.

I was not aware of this difference - I have to think about reverting the behavior for :.

if I should document

I would recommend to point to the unicode page

zeitlinger avatar Sep 10 '25 08:09 zeitlinger

sanitizeMetricName is meant to return a string that won't throw an exception when you pass it to the prom client.

Thanks. I think that's exactly what we need in Micrometer to convert from whatever the name is in Micrometer to what we pass to the Prometheus Java client. So should we continue using that (and sanitizeLabelName) or should we migrate to something else given the related changes (and potential changes in #1561)? I'm trying to figure out what's the correct usage going forward for Micrometer.

shakuzen avatar Sep 11 '25 04:09 shakuzen

sanitizeMetricName is meant to return a string that won't throw an exception when you pass it to the prom client.

  1. document the breaking change for : for micrometer users
  • supports Unicode
  1. preserve old behavior and not support Unicode
  • this is what https://github.com/prometheus/client_java/pull/1561 does, because it restores what the sanitize methods do
  • if you want Unicode, you have to use the PrometheusNames class instead
  1. add a new flag to micrometer to support Unicode
  • supportUnicode ? PrometheusNames.sanitize(..) : PrometheusNaming.sanitize(..)
  • this is similar to translation_strategy that OTel uses
    • except that it also deals being able to configure adding suffixes, which is still under discussion

zeitlinger avatar Sep 11 '25 07:09 zeitlinger

@shakuzen which solution would be your preference?

zeitlinger avatar Sep 11 '25 14:09 zeitlinger

Sorry for the late reply.

Micrometer generally has had support for unicode from the beginning. But each registry implementation (e.g. Prometheus) may not support unicode. That's where we use a NamingConvention to translate from Micrometer naming to what the registry expects. If we use a client library in the registry implementation, like we do the Prometheus Java client, we need to translate to what the client library expects. There we are calling PrometheusNaming.sanitize to transform the Micrometer name for a Meter or label to what the Prometheus Java client will use.

Since Prometheus Java client 1.4 added Unicode support, it seems it generally accepts anything, and depending on the exposition format/options, things will be sanitized at the time of scraping. In that sense, perhaps the current behavior makes sense, given that the accepted characters for the "internal name" or "intermediate name" (I'm not sure what to call it but what the client uses internally) used for metrics and labels in the Prometheus client have changed.

I'm fine to document this. I don't know how many users were relying on the previous sanitization that was happening. It would be unusual to have a : character in a meter name, but they could make and configure a custom implementation of PrometheusNamingConvention to restore that behavior if it's really needed. Unicode characters will still be sanitized when scraping, unless unicode is requested, so that's only changed for users opting-in. There is more separation now between what the client will accept for names and what will be exposed in a scrape. Some users may have issues with this, but it seems inherent to the domain given the flexibility in the client to export the same registered metrics in different scrape formats, and now with different options (unicode enabled or not). See, for example, https://github.com/micrometer-metrics/micrometer/issues/5923, where a user is hoping they can rely on the Micrometer PrometheusNamingConvention (where under the hood we use PrometheusNames) to figure out what the names will be in their backend given the Micrometer meter name and tag names, for the purpose of generating dashboards and graphs.

shakuzen avatar Sep 18 '25 07:09 shakuzen

For now, I've added a section in a migration guide for Micrometer 1.16 mentioning this difference.

shakuzen avatar Sep 18 '25 08:09 shakuzen

Thanks for the background 😄

I like your approach - especially giving users an idiomatic way to restore the old behavior only if they really need it!

I will keep the PR to this issue around - but probably close it later.

zeitlinger avatar Sep 18 '25 08:09 zeitlinger