WangxuP comments

Results 21 comments of


                                            WangxuP

服务Docker重启之后，如何恢复之前运行的模型

启动docker的时候，设置xinference的环境变量 XINFERENCE_HOME，然后将环境变量的目录再做个映射。 ``` docker run -v /home/models/:/home/models/ -v /nfs/models/xinference/xinference_cache/:/root/xinference_cache -e XINFERENCE_MODEL_SRC=modelscope -e XINFERENCE_HOME=/root/xinference_cache -d -p 9998:9997 --gpus all --name ffff --shm-size=256g xprobe/xinference:v0.10.1 xinference-local -H 0.0.0.0 --log-level debug ```

使用xinference部署模型，输出异常截断的问题

你在注册模型的时候那个content-length设置的多大呢？

How to load multiple Lora weights and multiple text inputs to inference？

> The core feature is supported, but we don't have checkpoint to demonstrate. You could modify the lora_manager to load multiple lora weights. I read `lora_manager.py` source code. It was...

How to load multiple Lora weights and multiple text inputs to inference？

> > > The core feature is supported, but we don't have checkpoint to demonstrate. You could modify the lora_manager to load multiple lora weights. > > > > >...

How to load multiple Lora weights and multiple text inputs to inference？

> 它们共享相同的基本模型。我们在这里有一个示例https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#run-llama-with-several-lora-checkpoints。 ok, thanks a lot!

33B的模型如何加载呢？

如标题，33B的模型如何加载呢？求指导

How can I use multiple GPUs for inference.

When I used the conference code you provided, there was still a problem, and the problem is below: ``` CUDA_VISIBLE_DEVICES=6,7 python src\inference_wizardcoder.py \ --base_model "WizardCoder-15B-V1.0" \ --input_data_path "data.jsonl" \ --output_data_path...

How can I use multiple GPUs for inference.

@ChiYeungLaw > ``` > CUDA_VISIBLE_DEVICES=6,7 python src\inference_wizardcoder.py \ > --base_model "WizardCoder-15B-V1.0" \ > --input_data_path "data.jsonl" \ > --output_data_path "result.jsonl" > ``` > > This works fine on our machine. Which...

How can I use multiple GPUs for inference.

> ``` > CUDA_VISIBLE_DEVICES=6,7 python src\inference_wizardcoder.py \ > --base_model "WizardCoder-15B-V1.0" \ > --input_data_path "data.jsonl" \ > --output_data_path "result.jsonl" > ``` > > This works fine on our machine. Which version...

How can I use multiple GPUs for inference.

@ChiYeungLaw 个人猜测问题可能是 cuda11.8 H800 GPU之间的兼容关系导致的。由于租的服务器已经到期，transformers这个问题暂时也没办法验证了。另外的一个问题是，我们都知道WizardCoder-15B-V1.0模型大概有32G，那么我们通过单机多卡（2*V100）的模式加载进去之后，单个卡上的显存消耗是否在32G/ 2 = 16G左右呢？您可以帮忙验证一下吗？然后将结果单卡运行的显存占用以及2卡运行的显存占用结果贴出来吗？谢谢了！ I guess is that the issue may be caused by the compatibility relationship between cuda11.8 H800 GPUs. Due...