gauge.set_function() doesn't work in multiprocess mode
Multiprocess mode's collect() reads the registry files and aggregates metrics that have been written to prometheus_multiproc_dir.
This doesn't work with gauge.set_function() which does not record its value. The provided function is just called during collection.
That means with the following code:
registry = CollectorRegistry()
Gauge("test", "test", registry=self._registry).set_function(lambda: 100)
multiprocess.MultiProcessCollector(registry)
The output will be:
# HELP test test
# TYPE test gauge
test 100.0
# HELP test Multiprocess metric
# TYPE test gauge
test{pid="10705"} 0.0
Current side effects:
- If any mode other than
allorliveallis used, thepidtag won't be included. This results in duplicate metrics being reported to Prometheus. Prometheus currently only uses the first metric it reads, which is non-deterministic due to iteration over the registry's dictionary. - If
registry=Noneto avoid double reporting, only the default value of0.0is reported. - Current way to work around it is to use the mode
alland to ignore gauges in Prometheus and with the tagpid.
Proposal:
I'm not sure how you could incorporate set_function into the multiprocess registry and I'm not convinced how useful of a feature it would be. Is it reasonable to add a new multiprocess_mode: exclude which would prevent the incorrect 0.0 value being reported? Or would it be better to just add documentation to recommend using two independent registries?
This classifies as a custom collector, which it's not possible to make work with multiprocess mode.
Makes sense, I'll look/ask about ways to work around that in the mailing list.
Given that multiprocess is getting some traction is it worth re-opening this?
I'm still not sure how this could be done in multiprocess mode. You would need to run the function for each process which is not a pattern we have today and would need to integrate with whatever is starting the process I believe. If there is a simpler way I am all ears, but otherwise I think it should probably be kept as closed.
Perhaps just documentation or a failure logged. Right now you just have to wonder
:+1: Documentation on the function is a good idea to avoid confusion.