Josh Leverette
Josh Leverette
@jmorganca I also think it is very important to emphasize that the memory usage of a given context size is not actually constant. *Something* is being allocated only when the...
Here is the complete log for an OOM on v0.1.20 using mixtral:8x7b-instruct-v0.1-q3_K_S ``` Jan 12 05:57:12 cognicore ollama[161484]: 2024/01/12 05:57:12 gpu.go:135: CUDA Compute Capability detected: 8.6 Jan 12 05:57:12 cognicore...
Thanks! ollama is great software! I look forward to being able to use larger models like Mixtral again effectively!
@jmorganca I tested the latest pre-release of 0.1.21 using one of my test cases that could consistently cause an OOM, and it seems like this issue is fixed for me....
@kennethwork101 rebooting should make no difference as far as ollama is concerned. It sounds like you have other apps that are using VRAM on your GPU, causing ollama's calculations to...
llama.cpp claims to support Refact. Possibly [added in this PR](https://github.com/ggerganov/llama.cpp/pull/3329), and [discussed here.](https://github.com/ggerganov/llama.cpp/issues/3061). So, I would expect this to be relatively straightforward now, but I haven't tested it myself yet,...
I can confirm that Tailscale breaks Steam Remote Play on Windows 11, when trying to communicate over the LAN from one device to another, not intentionally involving Tailscale in any...
@mitar are you one of the maintainers here? These things would only matter if the maintainers were reviewing this PR... so far, they've shown no inclination to accept any of...
We don’t know what the maintainers care about.
Also, I just saw the edit from @joliss, but my conclusion was that the paper’s findings were broadly applicable. I interpreted that they fine-tuned the final layer for a particular...