oneDPL icon indicating copy to clipboard operation
oneDPL copied to clipboard

Pass value and remove barrier between transform_reduce and reduce_over_group

Open AidanBeltonS opened this issue 2 years ago • 2 comments

This PR modifies how values are passed between transform_reduce and reduce_over_group. As opposed to moving data into local memory, syncing the work_group, then unloading we just move the value into a register and return it.

This allows us to save local memory loads and stores, and a barrier. We have seen improved performance on both Nvidia and Intel GPUs from this PR.

AidanBeltonS avatar Jan 22 '24 11:01 AidanBeltonS

@AidanBeltonS Please rebase this PR off main to resolve the conflicts.

julianmi avatar Feb 01 '24 15:02 julianmi

@AidanBeltonS Please rebase this PR off main to resolve the conflicts.

I have rebased the PR

AidanBeltonS avatar Feb 06 '24 12:02 AidanBeltonS