Results 81 comments of Bryce Allen

I think this was fixed in 54251ecdd591e6f3204a0b86d52c0944c7502958, and re-introduced in 5ef9f0b5de4adeca648b86e92813c52bd80865fa

Note that it's the version that takes an int array that is unsafe for device. The shape version is fine. The think the right solution is to re-apply fix from...

I think for now we still want to support CUDA 10, so global use of C++17 is not an option yet. Using it in SYCL ifdef'ed block is fine though....

I am curious if it works with the EXPLICIT_KERNEL version, which uses `gt::launch` instead of array expressions. The errors are not obvious, but my first guesses are that the cuda_fp16.h...

Another way we could go about this, I could add sycl::half support for the intel backend to provide the basic test/type structure, and you could add the cuda and/or AMD...

Oh I missed that, I would try building without `GTENSOR_USE_THRUST` set. And in general, it may be safer to use cmake to build, looking at your log I think you...

If you end up going down the route of adding some tests, feel free to submit a draft PR, and I can see what happens with sycl::half to try to...

Making GT_LAMBDA device only is a major problem, it would break all host launches. Not obvious to me how to workaround that limitation - I wonder if there is a...

I wonder if implicit conversion operators from backend specific device type to an appropriate compiler specific host type would help here? Like half -> _Float16 when using CUDA + g++.

I think it might make sense for me to try this out with sycl::half and _Float16, for sycl and host backends respectively. I suspect those to be easy, and then...