Dampfinchen comments

Results 53 comments of


                                            Dampfinchen

Improve memory management

> 6 is your entire GPU, leave some room for the browser and windows/xorg/etc. I know. I've only tried 6 GB because 5 GB (which --auto-devices is setting it to)...

Improve memory management

> set it to 4. It doesn't come fine-tuned I think. You have to do it all yourself. I had issue running pylymion6b on 4gb until I did 2, and...

Improve memory management

> I have heard this "oobabooga can't do memory management" meme quite a few times, usually from users who try a single value of `--gpu-memory`, get an error, and then...

Improve memory management

> I have heard this "oobabooga can't do memory management" meme quite a few times, usually from users who try a single value of `--gpu-memory`, get an error, and then...

Improve memory management

> I have added two options for finer VRAM control: > > 1. `--gpu-memory` with explicit units (as @Ph0rk0z suggested). This now works: `--gpu-memory 3457MiB` > 2. `--no-cache`. This reduces...

Improve memory management

> I will say this, on windows I've never gotten ooba to offload anything to sysram or even shared sysram without using deepspeed (despite the _agony_ that is getting deepspeed...

Support for Phi-3 models

This model is insane for its size.

No special token handling in imatrix, beam-search and others

Hmm, this might be the reason I'm seeing reports from people saying imatrix doesn't work properly with llama 3 models yet. (Low quality output)

No special token handling in imatrix, beam-search and others

@ikawrakow Any idea what could cause this? Have you done any tests so far in regards to imatrix and IQ quants for Llama 3?

Document how to use specific LLMs

IMO, Ollama can be a nuisance for people who already have gguf files because it requires GGUFs to be converted. It also has no GUI, in contrast to koboldcpp and...