Tadej Ciglarič

Results 15 issues of Tadej Ciglarič

Introduces support for having multiple CUDA devices in one context. To facilitate moving buffer and image memory between devices within the same context, some ABI-breaking changes had to be made...

Fixes a bug in barrier implementation in CUDA and HIP plugins that often caused barrier not to work. The new implementation is also faster. Tests in: https://github.com/intel/llvm-test-suite/pull/1122

## Summary Optimize kernel generator so it can use `matrix_cl`'s move assignment where possible instead of copying the data. ## Tests Added a test to check that the new optimization...

## Description Eigen's QR decomposition can be improved on with better parameter tunning. GPUs can be used for further speedup. ## Example QR decomposition is faster. ## Expected Output QR...

This is already work in progress. Anyway @bbbales2 asked me to write this and I agree it would be benefitial to have this written somewhere. Pinging some people that might...

Now that we are using Eigen expressions in Stan Math it makes sense for compiler to merge multiple expressions into singe one where possible to avoid unnecessary memory accesses. This...

optimization

Add support for the post ops on binary primitive that were previously unsupported (binary, prelu, remaining eltwise ones). Also fixes a bug in three-way type equality comparison that prevented binary...

Adds SYCL implementation of reorder primitive.

platform:nvidia-gpu
platform:amd-gpu

Add support for non-default memory formats to SYCL binary primitive. Also fixes some failing benchdnn tests.

Implements concat primitive by calling reorder for each of the inputs. Also adds support for offset in reorder.

platform:nvidia-gpu
platform:amd-gpu