3DUX-Net icon indicating copy to clipboard operation
3DUX-Net copied to clipboard

训练过程中到验证阶段出现显卡内存不足问题

Open guli-7721 opened this issue 2 years ago • 3 comments

我微调了一下代码,让模型用两个显卡跑,还是会出现显卡内存不足情况 。 把 max_iter 设为10万,eval_step设为1000,batch_size设为1,num_workers 设为0,用两张显卡跑也出现了显卡内存不足情况,但明明显示两个显卡内存分别占用7G左右,显卡是有充足的内存,不知什么原因了,请教您一下各位大佬解决问题的方法 ,谢谢!

image

guli-7721 avatar Aug 09 '23 07:08 guli-7721

求问怎么设置多显卡运行,需要改哪里?

landonmax0 avatar Apr 01 '24 08:04 landonmax0

I have the same problem, have you solved this problem?

dream-mjq avatar Apr 09 '24 13:04 dream-mjq

I solved it, if you put your own dataset and you need to change the dataset name, remember that it's in load_dataset_transforms.py, flare change it to your own dataset name. I hope you find the above answer helpful!

------------------ 原始邮件 ------------------ 发件人: "MASILab/3DUX-Net" @.>; 发送时间: 2024年4月9日(星期二) 晚上9:38 @.>; @.@.>; 主题: Re: [MASILab/3DUX-Net] 训练过程中到验证阶段出现显卡内存不足问题 (Issue #50)

I have the same problem, have you solved this problem?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

guli-7721 avatar Apr 10 '24 03:04 guli-7721

I am closing the older bug reports as these were missed. We are now better tracking reports across the organization. Please re-open if this continues to be a blocker.

BennettLandman avatar Aug 01 '24 16:08 BennettLandman

我微调了一下代码,让模型用两个显卡跑,还是会出现显卡内存不足情况 。 把 max_iter 设为10万,eval_step设为1000,batch_size设为1,num_workers 设为0,用两张显卡跑也出现了显卡内存不足情况,但明明显示两个显卡内存分别占用7G左右,显卡是有充足的内存,不知什么原因了,请教您一下各位大佬解决问题的方法 ,谢谢!

image

请问您有完整的修改代码吗?这个原始代码错误的地方太多了?

RY-97 avatar Sep 09 '24 01:09 RY-97

@guli-7721

RY-97 avatar Sep 09 '24 01:09 RY-97