distributed-join issues

Investigate device memory usage outside memory pool

1

By default, the memory pool size used is the total GPU memory - 500MB. During some OOM runs, we observed using smaller memory pool solves the OOM issue. This indicates...

gaohao95

Memory leak in UCXBufferCommunicator

2

`UCXBufferCommunicator` allocates buffers for `count` and `recv_buffer` used in callbacks but never frees them. Although these buffers are small (8 bytes each), we should consider making it clean.

nsakharnykh

Replace `cudaStreamDefault`

2

`cudaStreamDefault` is a flag used for passing to `cudaStreamCreateWithFlags` instead of a valid stream. We should replace that with either `rmm::cuda_stream_default` or `0`.

gaohao95

distributed-join
distributed-join copied to clipboard

Metadata

Investigate device memory usage outside memory pool

Memory leak in UCXBufferCommunicator

Replace `cudaStreamDefault`

Add NVTX range to make Nsight System profiling results easier to understand

Investigate memory usage

Improve overlapping efficiency

Use cuDF's error checking utilities

Coordinate buffer communicator and default UCX communicator

Eliminate small custom kernels in favor of lambda functions

← Metadata

Owner

Metadata

distributed-join distributed-join copied to clipboard

Metadata

← Metadata

Owner

Metadata

distributed-join
distributed-join copied to clipboard