gaohao95
gaohao95
> I think it will be cleaner to have two trees, one host-side, and one device-side, in place of the dual-storage solution at the array level that is currently in...
This feature would be a great help for us. We use `cudf::table` as a CPU-memory storage to enable zero-copy. In theory, CPU memory has more capacity to hold larger tables,...
On the performance side, I tested 67c683b18a46686867f3f0b5e65bcff79e4d6f74 on an A100 with 1 billion probe table rows with various build table rows and 0.01 selectivity. The speedup is good, a solid...
> Are you referring to GPU occupancy or hash map occupancy (or load factor)? The occupancy of the kernel `unique_join_probe_kernel`.
One potential cause of this OOM is that thrust does not use RMM memory pool during table generation: https://github.com/rapidsai/distributed-join/blob/bc7563a5f5ef2a76bfa0ac275e304a9540f3fa3d/generate_dataset/generate_dataset.cuh#L231
> @gaohao95 is this still a problem, and if so, could it be one of the reasons for #51? I don't think it's related, since #51 happens without using `UCXBufferCommunicator`.
> I can't find `cudaStreamCreateWithFlags` in our code - did you mean that `cudaStreamDefault` is passed into RMM functions (allocate/deallocate), and we should replace it with `rmm::cuda_stream_default`? Yes. Basically we...
The use case is to load [Hive-partitioned dataset](https://duckdb.org/docs/data/partitioning/hive_partitioning) efficiently. To reconstruct the partition key columns, we need to know how many rows are loaded from each file. I currently don't...
> I think we can report the number of rows read per file when AST filter (row selection) is not being used. Would this work for you? Works for my...