Ze Wang
Ze Wang
docker run --gpus '"device=0,1,2,3"' \ --shm-size 1g \ -p 8081:80 \ -v /home/unionlab001/Model/qwen-72b:/data ghcr.io/predibase/lorax:latest \ --model-id /data/Qwen1_5-72B-Chat \ --trust-remote-code \ --quantize bitsandbytes-nf4 \ --max-batch-prefill-tokens 300 \ --max-input-length 200 \ --max-total-tokens...
@xqxls 请问这个问题解决了吗
Sure @magdyksaleh > docker run --gpus all \ --shm-size 1g \ -p 8081:80 \ -v /home/unionlab001/Model/qwen-32b:/data ghcr.io/predibase/lorax:latest \ --model-id /data/Qwen1_5-32B-Chat \ --trust-remote-code \ --quantize bitsandbytes-nf4 \ --max-batch-prefill-tokens 1024 \ --max-input-length...
> Note that you'll need to run with `--trust-remote-code` when launching LoRAX as the tokenizer is custom and hosted on HF. I pulled the latest docker and set --trust-remote-code on...
> Thanks @felixstander! > > @KrisWongz it looks like the model weights `.bin` file is trying to execute some code on deserialization. I wasn't able to repro this using the...
> Hey @KrisWongz, can you try using the param `stop_sequences` instead of `stop`? > > Example: > > ``` > client.generate(prompt, max_new_tokens=32, temperature=0.7, stop_sequences=[""]).generated_text > ``` It works on the...