willy born comments

Results 45 comments of


                                            willy born

[Question] How to disable intel mkl?

I changed to vcpkg description file: vcpkg.json (in source dir of arrayfire), so that it now uses openblas and fftw3. The improvement in speed on non-Intel CPU is remarkable. Since...

Improve performance of join

@umar456 I am not convinced that putting everything together into 1 kernel, will be the fastest solution. I'm especially thinking about: - low cache hit rate, which will screw up...

Improve performance of join

Hereby intermediate results on join improvements for OpenCL (CUDA & CPU follow later). Improvements vary dependent on the array dimensions (from 7% up to 700x faster). Please consult the attached...

Improve performance of join

@pradeep No worry. Remarks remain welcome, up to the point they are merged. PR#3144 is about the join, memcopy and JIT. PR#3145 is about the usage of join in 2...

OPT: Improved memcopy, JIT & join

Some extra comments to the realized performance impact: - copy linear array (MAX throughput) -- should be highest possible throughput is performed by the enclosed copy functions (OCL & CUDA)....

OPT: Improved memcopy, JIT & join

@9prady9 The reason the old kernels are no longer valid is a result of the hash calculation. For JIT kernels, we only take the function name (backend/cuda/jit.cpp:207) into account but...

OPT: Improved memcopy, JIT & join

This will have a serious performance impact, since the code generation is take more time than the exécution of the resulting kernel. You will also have to generate the kernel...

OPT: Improved memcopy, JIT & join

I already had all the dims available in int format and no longer in dim_t format, because they are updated inside the 'memcopy.hpp calls'. On top, most of the OpenCL...

OPT: Improved memcopy, JIT & join

I notice that this PR is already closed. Does it still make sense to update the code with the comments? Can it still be merged into master or am I...

OPT: Improved memcopy, JIT & join

All remarks are included now, except 1 on 'src/backend/cuda/kernel/memcopy.cuh ' from @9prady9 where I need some help. I will start the testing before Releasing. After the discussions, I got some...