ardfork

Results 13 comments of ardfork

In that case it's because you check for `__CUDA_ARCH__ < 700` for both atomicAdd half and half2 when half2 should be `__CUDA_ARCH__ < 600`. From https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomicadd: > The 32-bit __half2...

Tried to add ROCm HIP compatibility but it error with: ``` ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 6700 XT, compute...

With your changes, it now works with ROCm HIP (with patch below), but it is slower, making it likely not worth enabling it on that platform. I'm using a RX...

From the error, it seem like you are missing hipSPARSE on your system. I wasn't able to check if it was available in your repo distro. If they are not,...

The error is quite clear: `fatal error: 'hipsparse/hipsparse.h' file not found`. As it is finding hipcc, I don't think it's a problem of it finding your ROCm dir. Verify that...

In most distro, they should have a group like `rocm-hip-sdk`, at least amd repo for ubuntu/rhel/suse and arch repo have it named that way. That group will install all the...

This issue is missing info, please share the commands used to build llama.cpp, output of rocminfo and the full output of llama.cpp.

Guess, I forgot to answer here, this is the same issue as #173 which was fixed upstream and will be available in next ROCm version. Note that exllama v2 is...

This doesn't work for me. Still spouting gibberish with -fh2. I also do not understand your change for __compat_h2rcp, it's just a backport from ROCm 5.6 for those that use...

I spent a bit more time testing your patch. It seems to be a bit more coherent or at least different than without it. Without patch: ``` -- Testing 8...