Alex

Results 64 comments of Alex

> This is going to be related to #1046 . Nexmon needs an update (see issue https://github.com/seemoo-lab/nexmon/issues/500 ) for the RPi02w. Are you sure Alex? I I'm not using nexmon...

As it seems, 'load_quant()' in 'modules/GPTQ_loader.py' needs to pass one more (new) positional argument to [qwopqwop200 ](https://github.com/qwopqwop200)/ [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa): 'groupsize' after correcting SyntaxError, here's the trace: ``` Loading settings from /home/alex/oobabooga/settings.json......

> Thanks, passing the value triggers another exception: ``` Loading settings from /home/alex/oobabooga/settings.json... Loading llama-30b... Loading model ... Traceback (most recent call last): File "/home/alex/oobabooga/text-generation-webui/server.py", line 241, in shared.model, shared.tokenizer...

> Yea, it looks like there's more issues with the GPTQ changes today than just syntax. I rolled back the GPTQ repo to yesterdays version without any of his changes...

> > If anyone needs a known good hash to roll back to, you can reset here (make sure to run this in the GPTQ-for-LLaMa repo, of course) > >...

I 'fixed' inference by: ``` cd repositories/GPTQ-for-LLaMa git reset --hard 468c47c01b4fe370616747b6d69a2d3f48bab5e4 pip install -r requirements.txt python install_cuda.py install ``` Today's changes break things however

> > I 'fixed' inference by: > > ``` > > cd repositories/GPTQ-for-LLaMa > > git reset --hard 468c47c01b4fe370616747b6d69a2d3f48bab5e4 > > pip install -r requirements.txt > > python install_cuda_py install...

In any case, I reported https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/62 to [qwopqwop200 ](https://github.com/qwopqwop200)/ [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa)

[qwopqwop200](https://github.com/qwopqwop200) replied, as of today, LLaMA models need to be re-quantized to work with newset code I'll test and report back ;-)

Sum up: latest [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) code re-quantized HF LLaMA model(s) to 4bit GPTQ Changed `models/GPTQ_loader.py` `model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)` to `model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits, -1)` works for me, tested...