willy born

Results 45 comments of willy born

I changed to vcpkg description file: vcpkg.json (in source dir of arrayfire), so that it now uses openblas and fftw3. The improvement in speed on non-Intel CPU is remarkable. Since...

@umar456 I am not convinced that putting everything together into 1 kernel, will be the fastest solution. I'm especially thinking about: - low cache hit rate, which will screw up...

Hereby intermediate results on join improvements for OpenCL (CUDA & CPU follow later). Improvements vary dependent on the array dimensions (from 7% up to 700x faster). Please consult the attached...

@pradeep No worry. Remarks remain welcome, up to the point they are merged. PR#3144 is about the join, memcopy and JIT. PR#3145 is about the usage of join in 2...

Some extra comments to the realized performance impact: - copy linear array (MAX throughput) -- should be highest possible throughput is performed by the enclosed copy functions (OCL & CUDA)....

@9prady9 The reason the old kernels are no longer valid is a result of the hash calculation. For JIT kernels, we only take the function name (backend/cuda/jit.cpp:207) into account but...

This will have a serious performance impact, since the code generation is take more time than the exécution of the resulting kernel. You will also have to generate the kernel...

I already had all the dims available in int format and no longer in dim_t format, because they are updated inside the 'memcopy.hpp calls'. On top, most of the OpenCL...

I notice that this PR is already closed. Does it still make sense to update the code with the comments? Can it still be merged into master or am I...

All remarks are included now, except 1 on 'src/backend/cuda/kernel/memcopy.cuh ' from @9prady9 where I need some help. I will start the testing before Releasing. After the discussions, I got some...