zrm
zrm
> The strange thing is that I cannot make my system go faster with 36 threads. I'm trying to guess what causes this, but it's hard without access to the...
> The reason is that I am still not 100% convinced that this improves performance always, so it has to be optional. I'm not sure this is always optimal either,...
> It seems that --interleave=all with 36 threads does help, but it still does not become better compared to 18 threads This is starting to look like it *is* the...
There is now a new version of this which adds a `--numa` option that has to be specified or `ggml_numa_init()` is not called and then `numa.n_nodes` remains 0 and the...
LLaMA 65B-f16 is ~122GB. You can get 128GB of DDR4-2400 for around $150 and a 2S 8-channel Xeon E5 v4 to put it in for around $200. How much is...
> echo 1 > /proc/sys/vm/numa_interleave $ numactl --interleave=all ./main -n 512 -m models/65B/ggml-model-q4_0.bin --ignore-eos **-p "Someone told me there is a configuration option called \\"/proc/sys/vm/numa_interleave\\" -- what"** -t 32 ......