Patrick Devine

Results 426 comments of Patrick Devine

@TheHoneyTree `num_ctx` changes the context sizes which requires more memory and will definitely impact inference speeds (particularly if part of the model is being offloaded onto CPU). This is expected...

@pisoiu you can check out the PCIe bus speed + lane width using `lspci -s -vvv` in Linux. Use `lspci` to figure out the bus/device/function. As @rick-github mentioned, the cache...

@pisoiu I think it's saying the lane width is 4x so the theoretical maximum would be 1GB/s?

@LuisMalhadas These seem like very different issues. I can assure you we take every one of them seriously.

@MikeB2019x what's the output of `journalctl -u ollama.service`? The only service which you showed was `zfs-import-scan.service` which is unrelated to ollama.