WangxuP

Results 21 comments of WangxuP

启动docker的时候,设置xinference的环境变量 XINFERENCE_HOME,然后将环境变量的目录再做个映射。 ``` docker run -v /home/models/:/home/models/ -v /nfs/models/xinference/xinference_cache/:/root/xinference_cache -e XINFERENCE_MODEL_SRC=modelscope -e XINFERENCE_HOME=/root/xinference_cache -d -p 9998:9997 --gpus all --name ffff --shm-size=256g xprobe/xinference:v0.10.1 xinference-local -H 0.0.0.0 --log-level debug ```

你在注册模型的时候那个content-length设置的多大呢?

> The core feature is supported, but we don't have checkpoint to demonstrate. You could modify the lora_manager to load multiple lora weights. I read `lora_manager.py` source code. It was...

> > > The core feature is supported, but we don't have checkpoint to demonstrate. You could modify the lora_manager to load multiple lora weights. > > > > >...

> 它们共享相同的基本模型。我们在这里有一个示例https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#run-llama-with-several-lora-checkpoints。 ok, thanks a lot!

如标题,33B的模型如何加载呢?求指导

When I used the conference code you provided, there was still a problem, and the problem is below: ``` CUDA_VISIBLE_DEVICES=6,7 python src\inference_wizardcoder.py \ --base_model "WizardCoder-15B-V1.0" \ --input_data_path "data.jsonl" \ --output_data_path...

@ChiYeungLaw > ``` > CUDA_VISIBLE_DEVICES=6,7 python src\inference_wizardcoder.py \ > --base_model "WizardCoder-15B-V1.0" \ > --input_data_path "data.jsonl" \ > --output_data_path "result.jsonl" > ``` > > This works fine on our machine. Which...

> ``` > CUDA_VISIBLE_DEVICES=6,7 python src\inference_wizardcoder.py \ > --base_model "WizardCoder-15B-V1.0" \ > --input_data_path "data.jsonl" \ > --output_data_path "result.jsonl" > ``` > > This works fine on our machine. Which version...

@ChiYeungLaw 个人猜测问题可能是 cuda11.8 H800 GPU之间的兼容关系导致的。由于租的服务器已经到期,transformers这个问题暂时也没办法验证了。 另外的一个问题是,我们都知道WizardCoder-15B-V1.0模型大概有32G,那么我们通过单机多卡(2*V100)的模式加载进去之后,单个卡上的显存消耗是否在32G/ 2 = 16G左右呢? 您可以帮忙验证一下吗? 然后将结果单卡运行的显存占用以及2卡运行的显存占用结果贴出来吗? 谢谢了! I guess is that the issue may be caused by the compatibility relationship between cuda11.8 H800 GPUs. Due...