distributed icon indicating copy to clipboard operation
distributed copied to clipboard

Strange behavior of distributed client.compute vs. dask.compute when using dask-awkward, dask-histogram

Open lgray opened this issue 1 year ago • 0 comments

This is an unverified bug, so really more just startling behavior that I would like to understand.

Unfortunately the reproducer from this is high energy physicist analysis code and I have not yet been able to create a concise reproducer for this behavior. High energy physics task graphs tend to be quite large, thousands of layers, I'm not sure if this is somehow related.

Steps for a "maximal" reproducer are here: https://gist.github.com/lgray/8a28c5fcd707a2a6778f92cd598f0ca6 I'll continue to try to find something minimal but it really takes time to figure out from non-expert code. You'll have to setup calls to dask.compute yourself. I'm happy to help you.

Here is the performance report with client.compute: wwz-dask-report-client-compute.html.zip

Here is the performance report on the same code and task graph for dask.compute: wwz-dask-report-dask-compute.html.zip

In both cases I am using a local distributed client to perform the computation.

There are two major issues, one being that that taskgraph in the client.compute case doesn't appear to be optimized or is partially optimized (the number of tasks is different by 3k), and the other that the later calls to agglomerating histograms seem to be stalling for reasons I cannot deduce in the client.computecase. Along these lines the client.compute case is nearly 2x slower than the dask.compute case, and the memory usage in the client.compute case is 3x-4x than the dask.compute case.

Of note - it seems that the Client thread itself is stalling when using client.compute, since the dashboard stalled when the histogram tree-reduce starts. However, this doesn't happen in the dask.compute case, which is truly odd.

If I optimize the task graph ahead of time and pass that to client.compute the number of tasks run by client.compute makes sense but the stalling issue and corresponding compute slowdown are still there.

cc: @martindurant

lgray avatar Feb 29 '24 17:02 lgray