Async collector/gather interface
This might be a bad fit given async metrics don't come up much but I figured I would file it here since I ended up writing something like this for work and it might be nice to upstream it.
Is your feature request related to a problem? Please describe. It would be nice to be able to easily write gauges backed by futures or streams such as a sqlx query.
Describe the solution you'd like If gather and/or Collector were (optionally?) async this might allow for async metrics. That or some notion of an AsyncRegistry which could possibly live in another crate entirely.
Describe alternatives you've considered
- Implementing collect and using
futures::executor::block_oninside seems to work but this potentially ties up the thread gather is being run on. Given this same thread is likely being served inside a future if the metrics scrape endpoint is using actix, or warp etc... this seems at best a waste but could potentially tie up important worker threads. - Perform a pass before
Registry::gatheris called which updates normal synchronous metrics. I went with this and wrapped the registry with a type I ended up calling anAsyncRegistry. Seems to work reasonably well but all the machinery for doing this needs to be built.
Additional context
Here is an example pulled from the crate I wrote that tacks on the concept of an AsyncRegistry on top of the normal Prometheus::registry:
let registry = AsyncRegistry::new();
let workflow_state_counts_opts = Opts::new(
"workflow_state_counts",
"Number of workflows in state",
);
// `register_async` needs a DbConnectionSource. This should be backed by your application's
// database pool.
let pool = PgPool::new("postgresql://localhost/postgres").await
.expect("Create db pool");
// Register the async metric labeled by `workflow_state`. See the docs for DbIntGauge
// on the conventions for writing queries.
registry
.register_async(DbMetric::new(
pool,
DbIntGauge::new(
r#"
SELECT workflow_state, count(uuid) as value
FROM workflows
GROUP BY workflow_state
"#,
&["workflow_state"],
IntGaugeVec::new(workflow_state_counts_opts, &["workflow_state"])
.expect("Error creating workflow_state_counts metric"),
),
))
.expect("Error registering workflow_state_counts metric");
I will probably need to split these more expensive to collect metrics into another scrape potentially but other than that it seems to work nicely.
I understand if async is out of the scope of this crate or considered an anti-feature. I figured I would check in here first since first class async support would probably a lot cleaner than anything tacked on.
First of all thanks for the detailed feature request @robo-corg.
I understand if async is out of the scope of this crate or considered an anti-feature.
As a first thought I find supporting async collectors to be too specific for this general purpose library. I would not consider it an anti-feature. In case there are multiple users interested in async collectors I think it is worth considering / working on a design.
Supporting async collectors is especially interesting for rust-prometheus users building exporters instead of natively instrumenting their logic.
I figured I would check in here first since first class async support would probably a lot cleaner than anything tacked on.
Agreed. I would estimate this to require some larger design work.
What do others think?
I think as a low-hanging fruit you should mention something about this in the README.md, so when some newbie comes around (like me), will have a hint.
My question is simply whether is it safe to call this library from async code or not? I imagine it is, but my thread will do the calculation, so I should be aware of that.
Anyway, great library, it took me 5 minutes to set up an endpoint with a counter.
It's been a while since this discussion was last updated, but for people building exporters this would really be helpful. It would allow to tie the (async) collection to the actual scrapes which I'd say is more reliable than collecting the data in the background and then letting prometheus just scrape the internal "cache".
Usually having to maintain two scrape intervals results in unreliable or lost precision.