ssadasivam1

Results 5 comments of ssadasivam1

I went back and checked what exactly is wrong. It seems like the filtered output in vecD is wrong for a small fraction of the elements (roughly ~30-40 elements are...

Another interesting observation if it helps with debug. The number of wrong elements seems to be 16 or 32 or 48 or 64 -- seems to like multiples of 16...

Thanks, will compile with `-gencode` and check. Also I just cranked up the number of elements N to 5 million (it was 1 million earlier) which seems to increase the...

Update: I'm also unable to repro on A100 when compiling with `-gencode arch=compute_80,code=sm_80` It repros without compiling for the specific architecture.

@elstehle Our application from which this standalone simplified reproducer was extracted still fails with CUDA 12.4 and CUDA 12.5 So I do believe this issue still exists in our app,...