ardfork comments

Results 13 comments of


                                            ardfork

will it work with Nvidia P40 24GB on Linux?

In that case it's because you check for `__CUDA_ARCH__ < 700` for both atomicAdd half and half2 when half2 should be `__CUDA_ARCH__ < 600`. From https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomicadd: > The 32-bit __half2...

DRAFT: Introduction of CUDA Graphs to LLama.cpp

Tried to add ROCm HIP compatibility but it error with: ``` ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 6700 XT, compute...

DRAFT: Introduction of CUDA Graphs to LLama.cpp

With your changes, it now works with ROCm HIP (with patch below), but it is slower, making it likely not worth enabling it on that platform. I'm using a RX...

Cannot compile exllama_ext on ROCm

From the error, it seem like you are missing hipSPARSE on your system. I wasn't able to check if it was available in your repo distro. If they are not,...

Cannot compile exllama_ext on ROCm

The error is quite clear: `fatal error: 'hipsparse/hipsparse.h' file not found`. As it is finding hipcc, I don't think it's a problem of it finding your ROCm dir. Verify that...

Cannot compile exllama_ext on ROCm

In most distro, they should have a group like `rocm-hip-sdk`, at least amd repo for ubuntu/rhel/suse and arch repo have it named that way. That group will install all the...

[User] AMD GPU slower than CPU

This issue is missing info, please share the commands used to build llama.cpp, output of rocminfo and the full output of llama.cpp.

Multi-GPU issues

Guess, I forgot to answer here, this is the same issue as #173 which was fixed upstream and will be available in next ROCm version. Note that exllama v2 is...

Fix half2 with HIP

This doesn't work for me. Still spouting gibberish with -fh2. I also do not understand your change for __compat_h2rcp, it's just a backport from ROCm 5.6 for those that use...

Fix half2 with HIP

I spent a bit more time testing your patch. It seems to be a bit more coherent or at least different than without it. Without patch: ``` -- Testing 8...