[SYCL][CUDA] Allow joint_matrix to be loaded from const T
Fixes a bug where if joint_matrix_load attempts to load joint_matrix from an array of const Tincorrect behaviour will occur or an error will be thrown. To fix this we make use of std::remove_const_t<T> in appropriate places. This is important functionality for integrating joint_matrix with existing SYCL-DNN routines.
I think that similar problems might occur in the intel backends for their existing impl: I have not made corresponding changes because I do not have the hardware to test it.
Signed-off-by: JackAKirk [email protected]
btw the CUDA test failure is the now XFAIL'ed known flaky test Assert/assert_in_simultaneously_multiple_tus.cpp
ping @v-klochkov
/verify with https://github.com/intel/llvm-test-suite/pull/1280
/verify with https://github.com/intel/llvm-test-suite/pull/1280
Looks good to me. Thank you. Before proceeding to merge. Please add a test verifying the test case being fixed by this PR.
If there is an existing test, but it is XFAIL due to being flaky, please add a simpler test or a compile-only test.
Thanks. I've added all new possible use cases in the test here: https://github.com/intel/llvm-test-suite/pull/1280
/verify with https://github.com/intel/llvm-test-suite/pull/1280
Can this be merge now?