netdata-cloud icon indicating copy to clipboard operation
netdata-cloud copied to clipboard

Heatmap charts in Netdata

Open hugovalente-pm opened this issue 3 years ago • 25 comments

Goal

The need come for the representation of latency charts which now are rendered as a stacked chart like this: image

In the above charts, there are buckets of latencies as dimensions. These buckets are hard-coded in the data collector. This is a real example of a data query:

json payload for latency filesystem.ext4_write_latency chart
{
   "api": 1,
   "id": "filesystem.ext4_write_latency",
   "name": "filesystem.ext4_write_latency",
   "view_update_every": 5,
   "update_every": 5,
   "first_entry": 1644602900,
   "last_entry": 1644602950,
   "before": 1644602950,
   "after": 1644602905,
   "dimension_names": ["0us->1us", "1us->2us", "2us->4us", "4us->8us", "8us->16us", "16us->32us", "32us->64us", "256us->512us"],
   "dimension_ids": ["0us->1us", "1us->2us", "2us->4us", "4us->8us", "8us->16us", "16us->32us", "32us->64us", "256us->512us"],
   "latest_values": [0.0739187, 0.2739483, 0.2000296, 0.569623, 1.7698006, 0.6522811, 0, 0],
   "view_latest_values": [0.0739187, 0.2739483, 0.2000296, 0.569623, 1.769801, 0.6522811, 0, 0],
   "dimensions": 8,
   "points": 10,
   "format": "json",
   "result": {
 "labels": ["time", "0us->1us", "1us->2us", "2us->4us", "4us->8us", "8us->16us", "16us->32us", "32us->64us", "256us->512us"],
    "data":
 [
      [ 1644602905000, 2385.544, 521.8245, 78.06823, 20.11922, 16.152047, 1.983585, 0.2833692, 0.1416846],
      [ 1644602910000, 1005.6769, 219.691, 32.25781, 8.280783, 7.026062, 0.8164153, 0.1166308, 0.0583154],
      [ 1644602915000, 108.25282, 43.08954, 2.594651, 1.764481, 2.742578, 0.5041375, 0, 0],
      [ 1644602920000, 55.32629, 23.59504, 1.605293, 1.4134601, 1.731273, 0.4218429, 0.1259804, 0],
      [ 1644602925000, 4.663792, 2.39492, 2.216843, 0.9783493, 1.0303782, 0.4521649, 0.0740196, 0.1260484],
      [ 1644602930000, 2.736208, 1.5311441, 1.2571766, 0.5697738, 0.7697898, 0.2218547, 0, 0.0739516],
      [ 1644602935000, 0.1259911, 0.5778999, 0.1259911, 0.4519088, 0.9038177, 0, 0, 0],
      [ 1644602940000, 0.0740089, 0.2960356, 0.2000805, 0.3480983, 0.4440535, 0.1260716, 0.1260716, 0],
      [ 1644602945000, 0.1260813, 0.2521627, 0.2000097, 0.8304164, 1.512976, 0.326091, 0.0739284, 0],
      [ 1644602950000, 0.0739187, 0.2739483, 0.2000296, 0.569623, 1.769801, 0.6522811, 0, 0]
  ]
},
 "min": 0,
 "max": 2385.544
}

We need to render these data in a different way, to make it better for users see a heatmap of the latencies. For example, this is what Grafana is doing:

image

In the above chart:

  • Every dimension (time bucket) is now having its own row at the y-axis So, the y-axis is no longer used for the value of each point. It just allocates space for each dimension.
  • The colour intensity of each point is the value. And the exact colour used is based on the color scale below the chart.
  • Each point now has a height, so instead of being rendered as "point" it is a small vertical line.
  • The colour scale is fixed, but the values that are assigned to each colour, depend on the min and max value available for all dimensions in the given timeframe (the dataset we received). In the Grafana example above, min = 0, while max = 0.38.

The tooltip of the chart on Grafana has a small histogram, showing all the value of all dimensions for a given time (x-axis). In our case, we could keep the tooltip we already have:

  • don't sort the dimensions and
  • add a histogram (like Grafana does) above a list of dimensions.

This new type of charts should be available on our Netdata Cloud charts for any chart and the above logic should apply as well:

  • dimensions are converted to y-axis rows
  • values are distributed on a colour scale
  • each colour intensity of each point is the value

THIS IS FOR NETDATA CLOUD Agent dashboard will fallback to stacked charts.


Tasks to complete

FE

  • [x] make available new heatmap chart type on our charts library (this repo/ticket)
  • [x] add the heatmap chart type as an option on Netdata Cloud chart options (this repo/ticket)
  • [ ] https://github.com/netdata/dashboard/issues/403

Cloud BE

  • [x] change the Cloud BE API that is consumed by Cloud FE to support a new chart type (this new ticket on this repo)
  • [x] update swagger specs on Cloud BE for the API is consumed by the Cloud FE to have available this new chart type heatmap option (this new ticket on this repo)

Netdata Agent

  • [x] https://github.com/netdata/netdata/issues/12926
  • [ ] https://github.com/netdata/netdata/issues/12925
  • [x] https://github.com/netdata/netdata/issues/12927
  • [x] https://github.com/netdata/netdata/issues/12928

Documentation

  • [x] https://github.com/netdata/learn/issues/937

hugovalente-pm avatar Feb 09 '22 16:02 hugovalente-pm

This is great!

You have to think of 2 additional issues:

  1. You need on the new charts to allow turning any chart into heatmap. So, charts 2.0 on cloud need to have this option.
  2. Chart types are defined (per chart) at the data collector and are propagated to the UI. So, a new chart type should be created in netdata (now it only has line, area, stacked) called heatmap and allow it to reach the dashboard. For the old agent dashboard this should fallback to area.

ktsaou avatar Feb 09 '22 18:02 ktsaou

@ktsaou thanks for the inputs. please see my comments below:

  1. You need on the new charts to allow turning any chart into heatmap. So, charts 2.0 on cloud need to have this option.

will specify this on the requirements above

  1. Chart types are defined (per chart) at the data collector and are propagated to the UI. So, a new chart type should be created in netdata (now it only has line, area, stacked) called heatmap and allow it to reach the dashboard. For the old agent dashboard this should fallback to area.

For the old agent dashboard will open a issue on netdata/dashboard and link it here.

For the data collectors we probably need inputs from @ilyam8 and/or @thiagoftsm Guys, what are the changes we need to do on collectors or other things on the Agent?

hugovalente-pm avatar Feb 09 '22 18:02 hugovalente-pm

@ktsaou I mentioned "will add it" not that I had already added it 😅 the ticket is now updated 👍

hugovalente-pm avatar Feb 09 '22 18:02 hugovalente-pm

Guys, what are the changes we need to do on collectors or other things on the Agent?

@hugovalente-pm when we create charts, we give as argument the chart type, so we will need to change the collectors to send heatmap, area or any other type. In netdata core we will need to add the new chart types for the agent to understand what we are sending.

I suggest we sync cloud and agent dashboard for we release the new feature together. I think we can give a bad user experience if they see on cloud the heatmap and stacked chart on agent.

thiagoftsm avatar Feb 09 '22 19:02 thiagoftsm

@thiagoftsm unfortunately the agent shipped dashboard will stay behind. The agent will show stacked charts, until we update its entire visualization library to the one used by the cloud. It seems quite a lot of work to do currently. It will eventually be missing a lot of features. The Anomaly Advisor, metrics correlations, etc.

We are working on a feature to mark the agent dashboard as old and the cloud one as new, so all agent dashboard users will be advised to use the new and only fallback to old when the agent is not connected to cloud.

We are also investigating a way of bringing the cloud dashboard on the agent, without actually refactoring it. We will see if that works somehow.

For the moment, let the agent shipped dashboard stay behind. We will deal with this later.

ktsaou avatar Feb 09 '22 21:02 ktsaou

Thanks @ktsaou, I did not know all these details.

@hugovalente-pm I suggest we schedule a meeting to understand better how we can address these different behaviors. Is the cloud storing in its own database the user preferences for charts? If it is doing this, we can give to you a list of charts that will use heatmaps by default.

thiagoftsm avatar Feb 09 '22 21:02 thiagoftsm

Is the cloud storing in its own database the user preferences for charts

A user can change the type of a chart, but the default is still controlled by the data collector. So, until we move this setting away from the collector, it is the only place to do it. This means that the plugins.d protocol, and netdata internal RRDSET structures have to support a new chart type: heatmap. And of course all the documentation to be updated.

we can give to you a list of charts that will use heatmaps

Everything that uses time buckets should be turned by default to heatmap, by changing its chart type to heatmap at the data collector. Yes, please come up with this list, so that we can update the ticket and know what needs to be done.

There are a few collectors, like the web_log that present latency as max and avg, statsd timers that present latency in similar ways, fping (this needs a PR to the fping repo) and possibly more. We should modify these collectors to use time buckets and create heatmap charts too.

Generally we should use heatmap charts when:

  • The collector collects time buckets already
  • The collector collects individual events (web log responses each with a duration, statsd timer events each with each own duration, fping pings each with its own duration), which can be allocated into fixed time buckets, instead of just doing a min, max, average, etc on them.

how we can address these different behaviors

What do you mean? which behaviors?

ktsaou avatar Feb 10 '22 07:02 ktsaou

thanks @thiagoftsm for all the details, as @ktsaou said, we will have for now the Agent dashboard falling back to stacked charts when a collector has defined that a given chart should be a heatmap

to summarise behaviour:

  • Agent collector has an eBPF chart that is a latency chart and defines the chart type heatmap
  • Agent Dashboard doesn't support the heatmap so it will show the chart as a stacked chart
  • Netdata Cloud UI supports the heatmap and, with the metadata for the chart, it sees this is to be shown as a heatmap, so it displays the new chart type

at Netdata Cloud, for any chart, a user can override the "suggested" chart type. on Overview and Single Node view this is currently only stored for the present user session, not stored as user preferences. on Custom Dashboards users can define the chart type and this will be saved as part of the definition of the Custom Dashboard

to try to summarize what is needed for this feature, and in order to open respective tickets on other repos, we have:

  • [ ] make available new heatmap chart type on our charts library (this repo/ticket)
  • [ ] add the heatmap chart type as an option on Netdata Cloud chart options (this repo/ticket)
  • [ ] update swagger specs on Agent for the API is consumed by the BE to have available this new chart type heatmap option (this should be on netdata/netdata repo? will probably ask your help to do this @thiagoftsm )
  • [ ] change the collectors that currently support these buckets to define that for these charts that have time buckets the chart type is to be heatmap ([this should be on netdata/netdata repo? will probably ask your help to do this @thiagoftsm ) issue](netdata/netdata#12925))
  • [ ] change the Cloud BE API that is consumed by Cloud FE to support a new chart type (this new ticket on this repo)
  • [ ] update swagger specs on Cloud BE for the API is consumed by the Cloud FE to have available this new chart type heatmap option (this new ticket on this repo)
  • [ ] update our documentation for (https://github.com/netdata/learn/issues/937):
    • collectors that will be changed and inform that charts with time buckets will have these heatmap
    • any other relevant documentation that we specify chart types

hugovalente-pm avatar Feb 10 '22 10:02 hugovalente-pm

What do you mean? which behaviors?

I have the concern we store heatmap and agent only can show stacked, while the cloud will show heatmap.

@ktsaou last time I talked with our designer I was informed that we could not change color scheme like we were expecting. For this specific feature we will need either to change either the JS files to understand what we will show when it receives a heatmap, or we will need to change our internal API to change heatmap to stack. Considering the previous experience, I understand that the internal API can be the simplest road, but I will have to confirm with our agent team the possible impact on ACLK. On the other hand, considering that we will classify our current dashboard as old, we could only add a JS object that has this simple association. I would need to talk with visualization team to verify that we won't have blockers here.

thiagoftsm avatar Feb 10 '22 12:02 thiagoftsm

last time I talked with our designer I was informed that we could not change color scheme like we were expecting.

The colors can be changed even via dashabord_info.js for any context or any chart. I am not sure I understand your statement. It seems a wrong statement in my mind. Probably the designer stated that he does not want to change the colors, to have some uniformity. We can change them if we need to.

For this specific feature we will need either to change either the JS files to understand what we will show when it receives a heatmap,

In JS there a color array per dashboard theme (white, dark) for all chart types. Just one for all of them. I understand that front-end engineers will have to add a second color array for heatmaps. So heatmaps will have their own coloring.

or we will need to change our internal API to change heatmap to stack.

I don't get this. What is "internal" in this statement? The agent? The only change in the agent is to support another chart type. It already supports 3, it will now become 4. The agent does not really care what happens with them. It is just a flag, a label that follows every chart.

Considering the previous experience, I understand that the internal API can be the simplest road, but I will have to confirm with our agent team the possible impact on ACLK.

Still I don't get. How a simple flag can impact ACLK? Today charts define themselves as line, area, or stacked. Now there will be another one heatmap. Why this could affect ACLK?

On the other hand, considering that we will classify our current dashboard as old, we could only add a JS object that has this simple association.

This is totally irrelevant. Even the old dashboard has to map the new chart type to an existing one (heatmap is rendered as stacked). This mapping should not happen at the agent. It is the responsibility of the front-end to deal with it.

I would need to talk with visualization team to verify that we won't have blockers here.

They are here. Ask what you need to know. Avoid meetings please.

ktsaou avatar Feb 10 '22 12:02 ktsaou

I made many updates to my comment above. So please refresh.

ktsaou avatar Feb 10 '22 12:02 ktsaou

The colors can be changed even via dashabord_info.js for any context or any chart.

I agree, but the old designer said we have a color scheme for dark theme and another completely different for the white theme, so we could not only change the color. This was the motive I made that issue we closed after to create this.

I don't get this. What is "internal" in this statement? The agent?

The ACLK change a lot, probably I am having in my mind the first scheme that we made. Unless I am wrong, cloud was using api/v1/charts to get the data. If this is not happening, please, ignore what I wrote about to change our API.

They are here. Ask what you need to know. Avoid meetings please.

All right. We will do. :handshake:

@hugovalente-pm are we already moving ahead with the two first bullets from this comment? Are we going to convert this issue to epic and create an issue for each one of the bullets?

thiagoftsm avatar Feb 10 '22 12:02 thiagoftsm

I agree, but the old designer said we have a color scheme for dark theme and another completely different for the white theme, so we could not only change the color. This was the motive I made that issue we closed after to create this.

Anyway, he was wrong. We can do whatever we want with colors.

ktsaou avatar Feb 10 '22 13:02 ktsaou

@hugovalente-pm are we already moving ahead with the two first bullets from https://github.com/netdata/netdata-cloud/issues/265#issuecomment-1034746584 comment? Are we going to convert this issue to epic and create an issue for each one of the bullets?

@thiagoftsm this is the issue for Cloud FE (just noticed now that I had forgotten to put the label cloud-frontend) from where we drove the discussion and identified the needed changes and from where we need to link to the other tickets - we were doing this as mentions

image

you mention about "convert this issue to epic", what would be the difference?

Btw, there is main umbrella issue on netdata/product#1795

hugovalente-pm avatar Feb 11 '22 14:02 hugovalente-pm

you mention about "convert this issue to epic", what would be the difference?

This is the way I work with @cpipilas, when we have different bullets we convert the issue for epic and after this we create a new issue for each one of the bullets. This way we can monitor the progress step-by-step. If you are not working like this, no problem I can adapt myself and write the requirements for front end here.

thiagoftsm avatar Feb 11 '22 14:02 thiagoftsm

got it @thiagoftsm , we can convert this to an EPIC if it makes easier for tracking

hugovalente-pm avatar Feb 11 '22 17:02 hugovalente-pm

@novykh please also add this on your list for review to see if details are ok to push to FE backlog

hugovalente-pm avatar Feb 11 '22 18:02 hugovalente-pm

Charts that could be heatmaps:

  1. idlejitter charts
  2. fping
  3. statsd metrics (TBD which ones)

ktsaou avatar Mar 16 '22 17:03 ktsaou

we agreed to try to start this task this week, considering this comment as the summary of the bullets that we will need some work

@novykh @jjtsou for Cloud FE

  • [ ] make available new heatmap chart type on our charts library (this repo/ticket)
  • [ ] add the heatmap chart type as an option on Netdata Cloud chart options (this repo/ticket)

@TonyPath for Cloud BE there are these two bullets that we will need some work

  • [ ] change the Cloud BE API that is consumed by Cloud FE to support a new chart type (this new ticket on this repo)
  • [ ] update swagger specs on Cloud BE for the API is consumed by the Cloud FE to have available this new chart type heatmap option (this new ticket on this repo)

@thiagoftsm for Agent we have the tickets that you created already but @ktsaou had also mentioned these below, do we need a ticket for those?

  • idlejitter charts
  • fping
  • statsd metrics (TBD which ones)
  • [ ] update swagger specs on Agent for the API is consumed by the BE to have available this new chart type heatmap option (netdata/netdata#12926)
  • [ ] change the collectors that currently support these buckets to define that for these charts that have time buckets the chart type is to be heatmap (netdata/netdata#12925)

@DShreve2 this is the ticket that Tina had created in the past https://github.com/netdata/learn/issues/937

hugovalente-pm avatar May 16 '22 09:05 hugovalente-pm

Hello @hugovalente-pm ,

I think it will be better for the product team to monitor the tasks if we have tickets for each one of the plugins you wrote. This will also help us to split work between developers.

Best regards!

thiagoftsm avatar May 16 '22 13:05 thiagoftsm

thanks @thiagoftsm I hadn't found those tickets hence my question :)

I've moved the previous ones you created, which were under netdata/netdata-cloud to netdata/netdata and created the following ones (using yours on eBFP and Python as a template)

  • [ ] https://github.com/netdata/netdata/issues/12927
  • [ ] https://github.com/netdata/netdata/issues/12928
  • [ ] https://github.com/netdata/netdata/issues/12929 --> this one doesn't have the list of the charts

to measure expectations, the heatmap chartype implementation FE-wise is aimed to be done on Cloud, on the agent dashboard these will fallback to a stacked chart - we still need the collectors to flag these charts as heatmaps so on Cloud they will be properly shown.

hugovalente-pm avatar May 16 '22 13:05 hugovalente-pm

@novykh, on a discussion with @amalkov it seemed to make sense for this one to be @MichaelGamel's next task before pushing this forward it would probably make sense to re-assess the estimate on this one

hugovalente-pm avatar Aug 01 '22 10:08 hugovalente-pm

There is an ongoing integration with uPlot library on charts repo. This library has a heatmap chart type.

novykh avatar Aug 01 '22 10:08 novykh

ok, cool @novykh please share once you have some more insights into that curious to understand if it will be simpler to implement what we have on this user story

hugovalente-pm avatar Aug 01 '22 12:08 hugovalente-pm

@ktsaou from my understanding the work on the agent will only be done once we have the Cloud UI as the Agent dashboard, right?

example:

  • https://github.com/netdata/netdata/issues/12927
  • https://github.com/netdata/netdata/issues/12928

hugovalente-pm avatar Jun 02 '23 15:06 hugovalente-pm