Ze Wang

Results 6 comments of Ze Wang

docker run --gpus '"device=0,1,2,3"' \ --shm-size 1g \ -p 8081:80 \ -v /home/unionlab001/Model/qwen-72b:/data ghcr.io/predibase/lorax:latest \ --model-id /data/Qwen1_5-72B-Chat \ --trust-remote-code \ --quantize bitsandbytes-nf4 \ --max-batch-prefill-tokens 300 \ --max-input-length 200 \ --max-total-tokens...

Sure @magdyksaleh > docker run --gpus all \ --shm-size 1g \ -p 8081:80 \ -v /home/unionlab001/Model/qwen-32b:/data ghcr.io/predibase/lorax:latest \ --model-id /data/Qwen1_5-32B-Chat \ --trust-remote-code \ --quantize bitsandbytes-nf4 \ --max-batch-prefill-tokens 1024 \ --max-input-length...

> Note that you'll need to run with `--trust-remote-code` when launching LoRAX as the tokenizer is custom and hosted on HF. I pulled the latest docker and set --trust-remote-code on...

> Thanks @felixstander! > > @KrisWongz it looks like the model weights `.bin` file is trying to execute some code on deserialization. I wasn't able to repro this using the...

> Hey @KrisWongz, can you try using the param `stop_sequences` instead of `stop`? > > Example: > > ``` > client.generate(prompt, max_new_tokens=32, temperature=0.7, stop_sequences=[""]).generated_text > ``` It works on the...