georgh
georgh
Are you sure? left has a fixed size here, for me that fails (even with the patch applied) Expected: lDims.ndims() == rDims.ndims() But even if you add this to matmul,...
Ok its really nice to know that tile doesn't need more memory :) But still, shouldn't ``` for kk in af.ParallelRange(10): res[:,:,kk] += tmp[kk] ``` work?
Sorry my bad, beside the first case ParallelRange works fine. But I still don't think the first case is a problem with matmul. Full example: ``` import arrayfire as af...
Or even simpler: ``` import arrayfire as af af.set_backend('cuda') res = af.constant(0, 20,20,10) xx = af.constant(0, 20,20) yy = af.constant(0, 20,20) for ii in af.ParallelRange(10): res[:,:,ii] = af.matmul(xx, yy) s...
Main remaining question would be, how you use the cpu during the gpu computation. Do you split with multiprocessing or is there an easier way?
At least until https://github.com/arrayfire/arrayfire/pull/1898 is merged and released
And I found out that you can reduce the dimension if you order your data the right way: ``` x = af.constant(0, 5, 5, 10, dtype=af.Dtype.f64) af.matmul(x[:,:,0],x[:,:,0]) #works ```
Yeah reordering my code is probably the better idea :)