KUANWB comments

Results 11 comments of


                                            KUANWB

在8张40g A100上运行微调代码，bsz=1，报显存不够错误，请问最低训练硬件条件是什么？

好的，我把deepspeed里面offload到cpu就可以了

在8张40g A100上运行微调代码，bsz=1，报显存不够错误，请问最低训练硬件条件是什么？

你管呢

采用微调代码训练后得到的pytorch_model.bin达到了62g，请问有什么办法拆分成多个文件并且满足推理代码的调用格式？

这样做是不是还需要在一张卡上推理。。。现在是想把这个62g的模型加载到到多张卡上进行推理，因为现在单张卡显存只有40g

微调运行run.sh报错

这个问题应该不是显存的问题，但是T4不知道硬件行不行，我这边是8张A100 40g的，报错OOM

微调的finetune_moss.py之后可以支持plugin版本吗

请问finetune的硬件要求是什么呀？A100 40g的好像跑不动

We have encountered some problems while trying to do the inference via two NVIDIA A10 GPUs

Thank you very much for your advise! We have already set the prompt list to one prompt, but it still raises the 'OutOfMemoryError'. We will try the 13B model later....

We have encountered some problems while trying to do the inference via two NVIDIA A10 GPUs

> @KUANWB One workaround is to run two separate instances and let them share their weights over middleware. Seems like an overkill for single-machine distribution, but you could also distribute...

We have encountered some problems while trying to do the inference via two NVIDIA A10 GPUs

> check here: https://github.com/juncongmoo/pyllama one gpu is enough Copy that! Thank you very much!

We have encountered some problems while trying to do the inference via two NVIDIA A10 GPUs

> > > > AssertionError: Loading a checkpoint for MP=0 but world size is 1 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1769) of binary: /usr/bin/python3 > > I have the...