cidtrips

Results 4 comments of cidtrips

Pretty sure this is a side effect of using a model quantized with a newer version of GTPQ-for-LLaMa. Until GPTQ-for-LLaMa is updated in this package, or more likely, the wheel...

> TheBloke has one of his unfiltered models quantized with --no-act-order up on huggingface. That should work for you, and may be a little more helpful than the filtered mode.

I've tried several different ways of merging the GPTQ code with fastchat, but keep breaking down at running a 4 bit quantized model on multiple gpus. I go back and...

Honestly, it's updating to transformers 4.30, adding one other dependency package, and about 8 changes in the code if I recall correctly. Plus it works with multi-gpus. Unfortunately I lost...