Rodrigo comments

Results 9 comments of


                                            Rodrigo

Low VRAM tests (now 8GB ok, begin to solve 6gb)

> Great, thanks! I have a 12GB vram gpu, does it also work for training ? Currently I can't train, even with a batch size of 1 I get OOM...

Feature request: Exllamav2 (exl2) backend

The exl2 is the most popular format for quantization, alongside GGUF. It would be amazing to have support for it. I can help with a PR but to be honest...

Feature request: Exllamav2 (exl2) backend

Thank you @EricLBuehler right now I am just exploring some existing implementations (e.g. https://github.com/chu-tianxiang/vllm-gptq/tree/exl2) and trying to see how we could fit it in.

Feature request: Exllamav2 (exl2) backend

@EricLBuehler do you think it is a good idea to create a new branch for EXL2 development? I have started with some parts, but it is only a draft for...

Unexpected behavior: Model loads on single GPU but fails on dual GPU setup

This issue is independent from the split parameter. With only a 3080 I can load 40 layers of the model (no split parameter), with two RTX, oom ... (also, no...

llama : add option to override model tensor buffers

I am able to load the model with ```llama-server -m /mnt/models/DeepSeek-R1-GGUF/DeepSeek-R1-UD-Q2_K_XL/DeepSeek-R1-UD-Q2_K_XL-00001-of-00005.gguf --threads 28 --host 0.0.0.0 --port 5001 -c 8192 -ngl 99 -ot exps=CPU ``` : | PID | DEV |...

llama : add option to override model tensor buffers

> Maybe try `-ngl 61` to keep the output layer on the CPU too (that oddly worked for me earlier when I was having trouble with the RPC stuff). No...

llama : add option to override model tensor buffers

> It's trying to allocate a tensor of size 2^64, which suggest there is an integer overflow somewhere. If you set the environment variable `GGML_SCHED_DEBUG=2`, it will print the graph...

llama : add option to override model tensor buffers

> Ok nvm, I think I see the problem. I will push a possible fix soon. I confirm that the fix worked, thank you @slaren. For the record, I am...