Ze Wang comments

Results 6 comments of


                                            Ze Wang

Error: Warmup(Generation("Not enough memory to handle 1024 prefill tokens. You need to decrease `--max-batch-prefill-tokens`")

docker run --gpus '"device=0,1,2,3"' \ --shm-size 1g \ -p 8081:80 \ -v /home/unionlab001/Model/qwen-72b:/data ghcr.io/predibase/lorax:latest \ --model-id /data/Qwen1_5-72B-Chat \ --trust-remote-code \ --quantize bitsandbytes-nf4 \ --max-batch-prefill-tokens 300 \ --max-input-length 200 \ --max-total-tokens...

[BUG] vllm0.3.3加速Qwen1.5系列模型，报ImportError: cannot import name 'model_schema' from 'pydantic.schema'

@xqxls 请问这个问题解决了吗

Error: Warmup(Generation("'bool' object has no attribute 'dtype'"))

Sure @magdyksaleh > docker run --gpus all \ --shm-size 1g \ -p 8081:80 \ -v /home/unionlab001/Model/qwen-32b:/data ghcr.io/predibase/lorax:latest \ --model-id /data/Qwen1_5-32B-Chat \ --trust-remote-code \ --quantize bitsandbytes-nf4 \ --max-batch-prefill-tokens 1024 \ --max-input-length...

Ze Wang

Error: Warmup(Generation("Not enough memory to handle 1024 prefill tokens. You need to decrease `--max-batch-prefill-tokens`")

[BUG] vllm0.3.3加速Qwen1.5系列模型，报ImportError: cannot import name 'model_schema' from 'pydantic.schema'

Error: Warmup(Generation("'bool' object has no attribute 'dtype'"))

Is there any plan to support dynamic lora for qwen/chatglm models?

Is there any plan to support dynamic lora for qwen/chatglm models?

Is there any plan to support dynamic lora for qwen/chatglm models?