Josh Leverette comments

Results 132 comments of


                                            Josh Leverette

Mixtral OOM

@jmorganca I also think it is very important to emphasize that the memory usage of a given context size is not actually constant. *Something* is being allocated only when the...

Here is the complete log for an OOM on v0.1.20 using mixtral:8x7b-instruct-v0.1-q3_K_S ``` Jan 12 05:57:12 cognicore ollama[161484]: 2024/01/12 05:57:12 gpu.go:135: CUDA Compute Capability detected: 8.6 Jan 12 05:57:12 cognicore...

Mixtral OOM

Thanks! ollama is great software! I look forward to being able to use larger models like Mixtral again effectively!

`CUDA out of memory` when using long prompts and context sizes

@jmorganca I tested the latest pre-release of 0.1.21 using one of my test cases that could consistently cause an OOM, and it seems like this issue is fixed for me....

`CUDA out of memory` when using long prompts and context sizes

@kennethwork101 rebooting should make no difference as far as ollama is concerned. It sounds like you have other apps that are using VRAM on your GPU, causing ollama's calculations to...

Add Refact model

llama.cpp claims to support Refact. Possibly [added in this PR](https://github.com/ggerganov/llama.cpp/pull/3329), and [discussed here.](https://github.com/ggerganov/llama.cpp/issues/3061). So, I would expect this to be relatively straightforward now, but I haven't tested it myself yet,...

Josh Leverette

Mixtral OOM

Mixtral OOM

Mixtral OOM

`CUDA out of memory` when using long prompts and context sizes

`CUDA out of memory` when using long prompts and context sizes

Add Refact model

Steam's Remote Play on Windows hosts fails to work when Tailscale is connected

Added grammar (and json schemas and CPU-only Dockerfile) support (from ollama/ollama PR #1606)

Added grammar (and json schemas and CPU-only Dockerfile) support (from ollama/ollama PR #1606)

Improved json grammar