candre23
candre23
> Model directly works 👍 Only partially. MS is using some new rope technique they're calling "longrope". As-is, LCPP will work ok for the first few gens but will then...
Another sparse MoE implimentation: https://github.com/predibase/lorax They make a lot of claims that are big if true. But who doesn't these days? https://predibase.com/blog/lora-land-fine-tuned-open-source-llms-that-outperform-gpt-4
I tried a few more times, and it seems to be hard-crashing the machine and causing a reboot every time it fails now. Here's full -v terminal output from shortly...
Based on the other issue explaining how lazy-unpickle works, I'm wondering if it's not recognizing the format of the 103b stacked models and that's why it's not using that method....
Having nothing to lose, I tried running this from within WSL (on the same machine) and the merge completed. Memory usage was still quite high - over 40GB and still...
Not sure if this is related to this issue specifically, but iQ3 quants of L3 are definitely broken right now. Strangely, iQ4 quants seem OK. Here's some PPL calcs I...
> file: manual_update.bat placed in extensions folder I ran this in the extensions directory and it successfully updated all the extensions that were out of date. However, after hitting the...
P40 weirdness seems to be even stranger than just "it's slow". I wanted to chart VRAM usage for different models at different prompt context sizes, and the results were... impossible?...
Ah, my apologies. I had no idea it was allocating memory for max context, regardless of how much context was actually being fed in. In retrospect, that perfectly explains what...
I'm actually doing this in oobabooga, not exllama proper. My ooba install is up to date, but I have no clue if their implementation is up to date with your...