Vishvendra Singh

Results 3 comments of Vishvendra Singh

You can use commands like below: python3 -m fastchat.serve.model_worker \ --model-path models/vicuna-7B-1.1-GPTQ-4bit-128g \ --gptq-ckpt models/vicuna-7B-1.1-GPTQ-4bit-128g/vicuna-7B-1.1-GPTQ-4bit-128g.safetensors \ --gptq-wbits 4 \ --gptq-groupsize 128 \ --gptq-act-order To ^^ this need to install: https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/fastest-inference-4bit...

The best solution to beat the pricing of OPENAI is use your own deployed llms using fastChat or textgen-ui, they have nice openai api However, A M1 laptop can easily...