stduhpf
stduhpf
It's 314B int8 parameters, so you would need 314GB of memory to load the model, plus some more for things like the K/V cache
I have observed the same thing. It seems that the image emebddings are not generated properly with Vulkan. Related issue: https://github.com/ggerganov/llama.cpp/issues/5545
This is expected. The vulkan backend doesn't support all features yet (including Mixtral architecture). I think this is a documentation issue, it should be made more clear wich features to...
> are there plans for vulkan backend to support Mixtral in the near future? I believe so: https://github.com/ggerganov/llama.cpp/pull/5835#issuecomment-1974877433
Ok I did some debugging, and it seems that the generaition of the embeddings with CLIP is what is broken here. Wich confuses me, because as far as I can...
@0cc4m On your branch, the answers are no longer complete gibberish, but still either very incoherent, completely unrelated to the actual image, and/or in the wrong language. I can confirm...
> > I didn't intentionally fix this yet, but maybe I found and fixed some related issue in the meantime. Let me know when you have more details. > >...
Seems to be fixed for me since befddd0f15de6efb15d7e7f5b527dfb671f4196f
I have a similar issue with a CLBlast build on Windows, on my rx5700XT. Offloading layers to GPU causes a very significant slowdown, even compared to my slow CPU. ```bash...
I got it to compile and install, and I can generate text much faster with it compared to clblast, but sadly the save state doesn't work, I will probably open...