KUANWB

Results 11 comments of KUANWB

这样做是不是还需要在一张卡上推理。。。现在是想把这个62g的模型加载到到多张卡上进行推理,因为现在单张卡显存只有40g

请问您那边硬件是什么?

这个问题应该不是显存的问题,但是T4不知道硬件行不行,我这边是8张A100 40g的,报错OOM

请问finetune的硬件要求是什么呀?A100 40g的好像跑不动

Thank you very much for your advise! We have already set the prompt list to one prompt, but it still raises the 'OutOfMemoryError'. We will try the 13B model later....

> @KUANWB One workaround is to run two separate instances and let them share their weights over middleware. Seems like an overkill for single-machine distribution, but you could also distribute...

> check here: https://github.com/juncongmoo/pyllama one gpu is enough Copy that! Thank you very much!

> > > > AssertionError: Loading a checkpoint for MP=0 but world size is 1 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1769) of binary: /usr/bin/python3 > > I have the...