Henri Vasserman comments

Results 249 comments of


                                            Henri Vasserman

ggml_cl_pool_malloc potential issue

I think most of the bigger buffers are RO, RW is used for reading the results back from the GPU and they are smaller usually (`n_batch * n_embd`).

cannot execute file: binary Format error for exec()

How did you compile llama.cpp? What compiler? Did you get any errors?

cannot execute file: binary Format error for exec()

What was the command that you used to quantize? It should be `./build/bin/quantize ./models/ggml-model-f16.bin ./models/ggml-model-q4_0.bin 2` or similar assuming you are in the llama.cpp root and that your CMake build...

Do I Just splice instructions and text together to test?

Alpaca uses special formatting to separate instructions and data. You can see the [templates used for tloen/alpaca-lora](https://github.com/tloen/alpaca-lora/blob/main/templates/alpaca.json). There are two variants, one with just instruction, and one with instruction and...

Unable to use Intel UHD GPU acceleration with BLAS

The provided Windows build with CLBlast using OpenCL should work but I wouldn't expect any significant performance gains from integrated graphics.

Unable to use Intel UHD GPU acceleration with BLAS

> copied OpenBLAS required file in the folders > then I followed the "Intel MKL" section Which one did you actually use? Did it actually find the Intel MKL library?...

Unable to use Intel UHD GPU acceleration with BLAS

> Maybe the GPU only accelerates 16bit operations, so the CPU is faster because it can run the 4bit stuff...? The OpenCL code in llama.cpp can run 4-bit generation on...

Unable to use Intel UHD GPU acceleration with BLAS

> the good thing is that you don't need to copy data vram->ram to access the data on cpu, it's just always shared by both llama.cpp is not optimized for...

Unable to use Intel UHD GPU acceleration with BLAS

If you had a dedicated GPU, bringing down the prompt evaluation below 60s @ 1000 tokens is very much doable.

Question about new models

There is [train-text-from-scratch](https://github.com/ggerganov/llama.cpp/tree/master/examples/train-text-from-scratch) but it's early days still.