HDDS-11566. Replication tasks add transferredBytes and queuedTime metrics
What changes were proposed in this pull request?
Added transferredBytes and queuedTime metrics for replication tasks, including ec reconstruction and container replication.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-11566
How was this patch tested?
ci: https://github.com/jianghuazhu/ozone/actions/runs/11310889447
datanode jmx:
@sodonnel @errose28 @fapifta , can you help review this PR? Thanks.
@jianghuazhu I have some questions about these metrics. My understanding is that metrics should have practical measurement value. Each machine may receive a large number of EC recover commands and replication commands; how can the bytes transferred help us? Additionally, these commands themselves have a timeout, so they won't execute beyond that time. What is the value of the queueing time?
cc: @whbing @weimingdiit
My understanding is that metrics should have practical measurement value
@slfan1989 @jianghuazhu I agree with this view. maybe the average waiting time of the queue is a useful metric, because it can tell us the backlog of commands in the queue, but I don't know whether the transferredBytes metric is meaningful for us to observe system performance.
We should have some way to approximate how much network traffic is happening due to replication tasks. To really visualize this it would have to go in a dashboard. I think this would look something like this:
- Ozone tracks total number of bytes transferred since the last restart
- In the dashboard (Grafana for example) we approximate network traffic over time by subtracting the current sampled metric value from the last sampled metric.
- Note that the metric's implementation in Ozone would always append to the counter, so the values would never decrease unless the cluster is restarted.
- The chart would then look like a step function where each plateau shows the amount of data transferred between its start and end times
@kerneltime is this the way such a thing is usually implemented? I'm not sure if we have a standard in other Ozone dashboards for how to chart continuous events like network traffic.
Thanks @errose28 for the comment and review. @kerneltime , do you have any new suggestions? Thanks.
@kerneltime @slfan1989 @weimingdiit how should we proceed on this PR?
- In the dashboard (Grafana for example) we approximate network traffic over time by subtracting the current sampled metric value from the last sampled metric.
Regarding this, I think there are two ways to achieve it:
-
Now that we have the total number of bytes transmitted, when we want to check the transmission trend, we can record the difference of the data outside the Ozone system. For example, the total number of bytes obtained at 09:10:20 is 300mb, and the total number of bytes obtained at 09:10:30 is 400mb, then the traffic trend during the period of 09:10:20~09:10:30 is 100mb.
-
A common module can be designed to handle functions similar to traffic transmission trends. It is necessary to implement aggregation according to time periods, for example, 5s, 1min, 1h. Regularly collect the difference between two time periods, 09:10:20 ~ 09:10:25->40mb, 09:10:30 ~ 09:11:30->500mb, 09:10:30 ~ 10:10:30->2gb, these differences should be defined as instantaneous values. In general, their effects should be like this:
@adoroszlai @errose28 @kerneltime , what do you think?
@jianghuazhu Thank for your contribution! However, I still don't fully understand the specific purpose of this metric. In other words, how do the fluctuations of this metric help us assess the state of a DataNode? We have a cluster where 99% of the data uses EC, but based on my maintenance experience, I typically don’t pay much attention to traffic changes caused by reconstruction on a DataNode, because compared to read/write traffic, the impact should be negligible. If the practical use of this metric cannot be clearly explained, I think it would be better not to add it to the system, as it would only increase the complexity of the metric framework.
cc: @errose28 @adoroszlai @weimingdiit
- A common module can be designed to handle functions similar to traffic transmission trends. It is necessary to implement aggregation according to time periods, for example, 5s, 1min, 1h. Regularly collect the difference between two time periods, 09:10:20 ~ 09:10:25->40mb, 09:10:30 ~ 09:11:30->500mb, 09:10:30 ~ 10:10:30->2gb, these differences should be defined as instantaneous values.
This viewpoint is partially agreed upon, as it does reflect that network traffic should indeed resemble the monitoring graph. However, I believe that such a complex statistical module should not be added internally within Ozone. Instead, it could be considered for implementation in an external metric collection system.
This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.
Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it.