GPUSorting icon indicating copy to clipboard operation
GPUSorting copied to clipboard

Possible incorrect sorting on Wave Size 128

Open b0nes164 opened this issue 1 year ago • 8 comments

See https://github.com/aras-p/UnityGaussianSplatting/issues/112

b0nes164 avatar Apr 21 '24 13:04 b0nes164

Possibly 64 as well as Qualcomm Adreno GPUs have a default subgroup size of 64.

alasin avatar Apr 22 '24 12:04 alasin

As stated on the README, I have tested on wave size 64.

b0nes164 avatar Apr 22 '24 13:04 b0nes164

Was it on an AMD GPU? Or Qualcomm Adreno?

alasin avatar Apr 23 '24 17:04 alasin

Tested on a 7900 XT with wave size locked to 64 using [WaveSize(<numLanes>)]. That being said, I have an Adreno 618 on the way, so I will be able to debug it then.

b0nes164 avatar Apr 23 '24 17:04 b0nes164

Welp thats extremely dissapointing. Debug

I will have to get another device, problem is that I can't readily find information on which Qualcomm devices support WaveIntrinsics. The best I can find is this. I also don't have the resources at the moment to go out and buy a laptop just to test my code on. So for now this will have to be on hold.

b0nes164 avatar Apr 29 '24 19:04 b0nes164

I usually use gpuinfo to check subgroup support in Vulkan for Qualcomm devices. Subgroup ops correspond one-to-one to wave ops in most cases. I also have a Quest 3 for which subgroup size is 64 (not sure if it can be changed to 128). If you need help in testing, I'd be happy to help debug.

alasin avatar Apr 30 '24 02:04 alasin

I usually use gpuinfo to check subgroup support in Vulkan for Qualcomm devices.

I totally forgot about that Vulkan has its own data base. In fact, if I remember correctly, the D3D12 one is a fork/based off of the Vulkan one.

I also have a Quest 3 for which subgroup size is 64 (not sure if it can be changed to 128). If you need help in testing, I'd be happy to help debug.

Do you know if it is possible to run this implementation on a Windows PC, using the Quest 3's Qualcomm chip as the device? The problem is that the D3D12 implementation probably will not run natively on the Quest 3. I've been meaning to make a Vulkan implementation, which shouldn't be too bad (DXC can compile to SPIR-V, so I would precompile the shaders from HLSL), but I just haven't had the time.

However, a user from another repo also has a Quest 3, and has offered to run tests for me, but in Unity. I believe that's the most straightforward course of action, because Unity handles the transpilation from HLSL to SPIR-V and can use Vulkan as the backend to run on Quest.

I very much appreciate the help though.

b0nes164 avatar May 01 '24 16:05 b0nes164

Do you know if it is possible to run this implementation on a Windows PC, using the Quest 3's Qualcomm chip as the device?

I doubt it, or at least I haven't seen it done before. It should be possible through Unity though. Looks like you're already on top of it!

alasin avatar May 02 '24 15:05 alasin

You can try change waveFlags &= t ? ballot : ~ballot; to waveFlags &= (t ? ballot : (~ballot)); in function WarpLevelMultiSplitWGE16, this can show small Gaussian Spaltting model. @b0nes164

LeeSYSU avatar Aug 20 '24 11:08 LeeSYSU

Closing this issue, as I have confirmed the bug does not have to do with the wave size. @LeeSYSU, copying your comment to #4.

b0nes164 avatar Aug 20 '24 15:08 b0nes164