opencl: fix rms_norm_mul

Open lhez opened this issue 5 months ago • 0 comments

The rms_norm_mul kernel produces incorrect result when ne00 = 768. This PR changes how the kernel does reduction to get the sum. This seems to fix the issue.

Nov 13 '25 19:11 lhez