xtuner icon indicating copy to clipboard operation
xtuner copied to clipboard

Qwen2-72B,16K长文本,convert转换为HF模型OOM

Open daiyafei2013 opened this issue 1 year ago • 3 comments

16K长文本已经训练好了,但是convert转换为HF模型发现OOM,我有六个卡,但是貌似只用到了一个。请问我该怎么改配置文件呢? torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 116.00 MiB. GPU 0 has a total capacity of 79.33 GiB of which 39.81 MiB is free. Process 1333184 has 79.28 GiB memory in use. Of the allocated memory 78.59 GiB is allocated by PyTorch, and 218.78 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

daiyafei2013 avatar Jul 15 '24 04:07 daiyafei2013

我也遇到了这个问题,请问解决了吗

Diyigelieren avatar Oct 21 '24 03:10 Diyigelieren

我也遇到了这个问题,请问解决了吗

已解决, 需要在config的model中的llm中增加一个device_map='auto'参数

Diyigelieren avatar Oct 21 '24 04:10 Diyigelieren

请问你72b,16k长文本用的多少资源呢?8卡80g能行吗

lmc8133 avatar Oct 31 '24 12:10 lmc8133