foundationdb icon indicating copy to clipboard operation
foundationdb copied to clipboard

Status Json Generation Can Be Very Long

Open jzhou77 opened this issue 3 years ago • 1 comments

clusterGetStatus() does many steps in serial order. Even though each step typically has a timeout, the total time can be very long, especially when there are faults in the cluster, e.g., when some storage servers are unavailable.

A second problem is that if a new step is added without a proper timeout, the process becomes unbounded. This is bad, because operational tools typically depends on the output from status json.

To bound the time for status generation, we need to parallelize the steps for generating status as much as possible, probably with a good code refactoring to address both of the above problems.

jzhou77 avatar Jul 25 '22 00:07 jzhou77

@sfc-gh-satherton mentioned that currently there is an optimization in the status json generation that reduces memory copies, which reduces the time from a few seconds to less than one second. The refactor should consider this optimization as well to reduce the total time.

jzhou77 avatar Jul 27 '22 17:07 jzhou77