Bryce Allen comments

Results 81 comments of


                                            Bryce Allen

CUDA host/device warnings (shape ctor not device)

I think this was fixed in 54251ecdd591e6f3204a0b86d52c0944c7502958, and re-introduced in 5ef9f0b5de4adeca648b86e92813c52bd80865fa

CUDA host/device warnings (shape ctor not device)

Note that it's the version that takes an int array that is unsafe for device. The shape version is fine. The think the right solution is to re-apply fix from...

cost of param limit size workaround, and optimizing view size

I think for now we still want to support CUDA 10, so global use of C++17 is not an option yet. Using it in SYCL ifdef'ed block is fine though....

half precision axpy

I am curious if it works with the EXPLICIT_KERNEL version, which uses `gt::launch` instead of array expressions. The errors are not obvious, but my first guesses are that the cuda_fp16.h...

half precision axpy

Another way we could go about this, I could add sycl::half support for the intel backend to provide the basic test/type structure, and you could add the cuda and/or AMD...

half precision axpy

Oh I missed that, I would try building without `GTENSOR_USE_THRUST` set. And in general, it may be safer to use cmake to build, looking at your log I think you...

half precision axpy

If you end up going down the route of adding some tests, feel free to submit a draft PR, and I can see what happens with sycl::half to try to...

half precision axpy

Making GT_LAMBDA device only is a major problem, it would break all host launches. Not obvious to me how to workaround that limitation - I wonder if there is a...

half precision axpy

I wonder if implicit conversion operators from backend specific device type to an appropriate compiler specific host type would help here? Like half -> _Float16 when using CUDA + g++.

half precision axpy

I think it might make sense for me to try this out with sycl::half and _Float16, for sycl and host backends respectively. I suspect those to be easy, and then...