yijia
yijia
> 请问vllm部署时如何使用多卡加载模型,使用`CUDA_VISIBLE_DEVICES=0,1`还是只有一张卡load了, 很奇怪,谢谢 try add `--tp=2` to launch argument
Hi, @hjc3613 , you can offload to nvme instead of cpu memory, please checkout out [nvme offload](https://www.deepspeed.ai/tutorials/zero/#offloading-to-cpu-and-nvme-with-zero-infinity).
You can achieve that by setting `fp32_optimizer_states=False ` in initialization of `DeepSpeedCPUAdam`, this param is added to deepspeed from version 0.14.3. note: if you are using transformers trainer, it will...
make sure you are using `DeepSpeedCPUAdam`, you can find the signature here [DeepSpeedCPUAdam](https://github.com/microsoft/DeepSpeed/blob/ffe0af23575c4f03a07408eacfc50b1a58781429/deepspeed/ops/adam/cpu_adam.py#L25)
may be you can refer to [partition_parameters.](https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/partition_parameters.py#L2044)