pycuda
pycuda copied to clipboard
Adding two arrays is much slower on the GPU
Hi, I am adding two large arrays, for some reason (probably thread/block/grid size games) the CPU version is much faster.
I thought it is because the array size is too SMALL for the GPU to show any advantage, however, I increased the array size until I got CUDA out of memory error.
The full code is here: https://github.com/QuantScientist/Data-Science-ArrayFire-GPU/blob/master/PyCUDA/02%20add%20with%20PyCUDA.ipynb
My device info is as follows:
1 device(s) found.
Device #0: GeForce GTX 1080
Thanks,
This issue tracker is for bugs, not technical support. Please send a message to the mailing list for tech support.
Look up PCIe bandwidth to help figure out why your code behaves the way it does.