explorer icon indicating copy to clipboard operation
explorer copied to clipboard

Emit metrics for Prometheus

Open wileyj opened this issue 5 years ago • 18 comments

Describe the bug Port 9153 to emit metrics is no longer open

Expected behavior port 9153 is open, and the path /metics reports TSDB data

wileyj avatar Jan 15 '21 23:01 wileyj

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 15 '21 01:07 stale[bot]

bump

wileyj avatar Jul 15 '21 14:07 wileyj

history: because of how the explorer works, it can potentially track millions of paths for latency and other metrics. since we're using a pull based metrics model, every n seconds when we pull the metrics from the explorer the data we're pulling can be an extremely large dataset. what was happening was when metrics were enabled for the explorer, every n seconds the explorer latency would spike while the metrics were pulled. Not to mention the latency measurements are essentially unusable since each path is different.

to resolve this for the API, @zone117x did some work to store similar paths into dedicated buckets. for example, all /block/xxxx requests would be stored in a blocks bucket, etc.

I'll look for the PR in that repo that enabled this and link here shortly.

wileyj avatar Jul 19 '21 15:07 wileyj

https://github.com/blockstack/stacks-blockchain-api/pull/502

https://github.com/blockstack/stacks-blockchain-api/blob/master/src/api/init.ts#L196-L226

wileyj avatar Jul 19 '21 15:07 wileyj

Thanks @wileyj 🙏 Assigning to @aulneau so he can provide some input as well for anyone who might tackle this issue.

markmhendrickson avatar Jul 19 '21 18:07 markmhendrickson

I can start looking at this if @aulneau can provide a bit more context?

fbwoolf avatar Jul 19 '21 18:07 fbwoolf

Sure -- so we need to do a few things.

We are no longer using the custom server found here: https://github.com/blockstack/explorer/blob/main/server/index.js

We need to start using it again, and it needs to be updated in a few ways:

Basically these changes make it so that all types of paths put their metrics into a single respective bucket, meaning rather than having a single metrics entry per address that is navigated to, we have all metrics combined into a single address bucket.

@zone117x might be able to provide further guidance

aulneau avatar Jul 19 '21 18:07 aulneau

The API PR first implementing the route condensing is actually https://github.com/blockstack/stacks-blockchain-api/pull/412

zone117x avatar Jul 19 '21 20:07 zone117x

with the promster library - they have some defaults set that are capping our data. https://github.com/tdeekens/promster

for the sample metric, http_request_duration_seconds_bucket they clearly state this in the readme:

http_request_duration_seconds: a Prometheus histogram with request time buckets in milliseconds (defaults to [ 0.05, 0.1, 0.3, 0.5, 0.8, 1, 1.5, 2, 3, 5, 10 ])

which corresponds with the data we've seen. It appears it's possible to modify this, but would require a code change (in the API and any other repo where these are not the defaults we want): https://github.com/tdeekens/promster/blob/main/packages/metrics/src/create-metric-types/create-metric-types.ts#L173-L184

for the stacks-blockchain-api repo, this was addressed in this commit: https://github.com/blockstack/stacks-blockchain-api/commit/735874e45c1e198724e7d01ca9e4eec4d108706c

wileyj avatar Sep 15 '21 14:09 wileyj

@wileyj Should I assign this issue to you? I'm not exactly sure where it stands in relation to the changes made by @fbwoolf with https://github.com/blockstack/explorer/pull/491

markmhendrickson avatar Sep 16 '21 11:09 markmhendrickson

@wileyj Should I assign this issue to you? I'm not exactly sure where it stands in relation to the changes made by @fbwoolf with #491

That PR was put on hold and this issue was unassigned to me bc I started it prematurely before we could focus on performance/metrics. It needed further development/testing to get working.

fbwoolf avatar Sep 16 '21 13:09 fbwoolf

@wileyj Should I assign this issue to you? I'm not exactly sure where it stands in relation to the changes made by @fbwoolf with #491

the comment I made was informational only based on something we learned from the stacks-blockchain-api

wileyj avatar Sep 16 '21 13:09 wileyj

@fbwoolf do you have enough context on this issue already to t-shirt it? I presume it's "1+ weeks dev" as it currently stands?

@wileyj could you specify precisely what data is currently not available for the Explorer via Prometheus given where things stand today (i.e. just what data this issue would give us once executed)?

markmhendrickson avatar Nov 15 '21 10:11 markmhendrickson

@fbwoolf do you have enough context on this issue already to t-shirt it? I presume it's "1+ weeks dev" as it currently stands?

I do think 1 week makes sense just bc there is some research involved in understanding what we need to do for this and how to test it. Where are the new labels? Am I supposed to apply it here?

fbwoolf avatar Nov 15 '21 14:11 fbwoolf

Didn't mean to close this!

CharlieC3 avatar Mar 25 '22 19:03 CharlieC3

@MrHeidar Any chance we can revitalize this issue? It would be instrumental in learning more about how the Explorer functions internally.

CharlieC3 avatar Apr 08 '22 15:04 CharlieC3

We can review for prioritization again. @CharlieC3 any additional context re: the types of data points this will give us in particular (per https://github.com/hirosystems/explorer/issues/329#issuecomment-968726346)?

markmhendrickson avatar Apr 11 '22 11:04 markmhendrickson

@markmhx I don't believe any custom data points were added in Explorer code, so it should at least provide the promster defaults if not more: https://github.com/tdeekens/promster#garbage-collection https://github.com/tdeekens/promster#http-timings-hapi-express-marblejs-and-fastify

CharlieC3 avatar Apr 11 '22 13:04 CharlieC3

I'll close this issue for now

andresgalante avatar Dec 16 '22 18:12 andresgalante