Dampfinchen

Results 53 comments of Dampfinchen

> 6 is your entire GPU, leave some room for the browser and windows/xorg/etc. I know. I've only tried 6 GB because 5 GB (which --auto-devices is setting it to)...

> set it to 4. It doesn't come fine-tuned I think. You have to do it all yourself. I had issue running pylymion6b on 4gb until I did 2, and...

> I have heard this "oobabooga can't do memory management" meme quite a few times, usually from users who try a single value of `--gpu-memory`, get an error, and then...

> I have heard this "oobabooga can't do memory management" meme quite a few times, usually from users who try a single value of `--gpu-memory`, get an error, and then...

> I have added two options for finer VRAM control: > > 1. `--gpu-memory` with explicit units (as @Ph0rk0z suggested). This now works: `--gpu-memory 3457MiB` > 2. `--no-cache`. This reduces...

> I will say this, on windows I've never gotten ooba to offload anything to sysram or even shared sysram without using deepspeed (despite the _agony_ that is getting deepspeed...

This model is insane for its size.

Hmm, this might be the reason I'm seeing reports from people saying imatrix doesn't work properly with llama 3 models yet. (Low quality output)

@ikawrakow Any idea what could cause this? Have you done any tests so far in regards to imatrix and IQ quants for Llama 3?

IMO, Ollama can be a nuisance for people who already have gguf files because it requires GGUFs to be converted. It also has no GUI, in contrast to koboldcpp and...