4onen
4onen
Appears to be a bug. Line 167 of url/Url.elm indexes the first `:` in the host of the URL and, if it detects more than one `:`, assumes the URL...
Sketchy performance comparison on my laptop to show why `--override-tensors` helps MoE models. I set longer context lengths than the standard `llama-bench` to emphasize why keeping the attention operations on...
PR #12891 has resolved my issue running flash attention and override-tensors with Deepseek-V2-Lite. Some performance numbers for that, same hardware as my last set: **CPU Only** (Used 0.8GB of VRAM...
Ran another set of experiments on another device (RTX 3070 and an AMD Ryzen 7 5800X 8-Core with two sticks of 2133MHz DDR4) **CPU Only** (Used 836MB of VRAM during...
Got it @slaren. As for splitting the test grid entries, would you prefer that I use semicolons instead of commas the same way that we do for tensor split? Or...
I've implemented the behaviour the same way as tensor-split, for now. That is, `;` is now the internal separator for different overrides and `,` is now the separator between test...
I understand now why all of the other functions in that file were marked `static`. I'll see if I can get my linux desktop up and make sure I run...
All I can say is the CPU CI ran to completion on my Ubuntu 22.04 machine with no errors I was aware of. I'll try to take a look at...
Tried the Vulkan CI (because I can't run the CUDA CI on my desktop with my nvcc, apparently) and that failed on an unused parameter in a file my change...
Adding `CMAKE_CUDA_ARCHITECTURES=86` (for the 3070 in my desktop) resulted in the same message. It's possible that my driver and NVCC CUDA versions are desynced, as `nvidia-smi` reports CUDA version 12.7....