Tanmay Gujar
Tanmay Gujar
I think the approach of specializing the type dispatcher is very cumbersome and will lead to a lot of code replication. Currently, I have the conditional dispatch working for `device_row_hasher`...
Since option 1 doesnt incur the cost of JIT compilation maybe this is the better approach in terms of performance. My current plan is to arrange the types in increasing...
Although I tested this out only for `mixed_semi_join` this should be applicable to all hash joins which use `device_row_hasher` and `device_row_comparator`. This allows us to compile different versions of the...
Ah okay, this would also achieve what we need except not have the complexity in the `type_dispatcher` switch. Makes sense!
Adding results for reference. Benchmarks from cudf, all join types, speedups from disabling complex types on A100 ``` # inner_join ## [0] NVIDIA A100-PCIE-40GB | Key | Nullable | left_size...
I can work on this if we can confirm in this is indeed a correctly reported bug. Let me know what you think, thanks!
I think this would push the responsibility to the user to figure out what may be non-deterministic. I am not sure if this would be a good approach
Specializing both the comparator and the hasher drops the register usage to 54 instead of the expected 46 for the mixed semi join case. Investigating why the register pressure is...
I have a question here. Is it preferable that I make the changes to all the join operations in this PR or break them up into different ones?
Benchmark results. MR adds specialized dispatch for build and probe in case of hash joins, and only for build in case of mixed semi/anti joins. Other joins are not modified...