Dampfinchen
Dampfinchen
> 6 is your entire GPU, leave some room for the browser and windows/xorg/etc. I know. I've only tried 6 GB because 5 GB (which --auto-devices is setting it to)...
> set it to 4. It doesn't come fine-tuned I think. You have to do it all yourself. I had issue running pylymion6b on 4gb until I did 2, and...
> I have heard this "oobabooga can't do memory management" meme quite a few times, usually from users who try a single value of `--gpu-memory`, get an error, and then...
> I have heard this "oobabooga can't do memory management" meme quite a few times, usually from users who try a single value of `--gpu-memory`, get an error, and then...
> I have added two options for finer VRAM control: > > 1. `--gpu-memory` with explicit units (as @Ph0rk0z suggested). This now works: `--gpu-memory 3457MiB` > 2. `--no-cache`. This reduces...
> I will say this, on windows I've never gotten ooba to offload anything to sysram or even shared sysram without using deepspeed (despite the _agony_ that is getting deepspeed...
This model is insane for its size.
Hmm, this might be the reason I'm seeing reports from people saying imatrix doesn't work properly with llama 3 models yet. (Low quality output)
@ikawrakow Any idea what could cause this? Have you done any tests so far in regards to imatrix and IQ quants for Llama 3?
IMO, Ollama can be a nuisance for people who already have gguf files because it requires GGUFs to be converted. It also has no GUI, in contrast to koboldcpp and...