generative-models Large memory requirement when implementing sds loss using SV3D

Dear devs,

Thanks for open-sourcing this great work! I am trying to implement the 3D reconstruction part in the paper, but I find problems when implementing sds loss. As sds loss is computed on the latents, I have to retain the grad of the vae encoder, which is extremly memory expensive (about 0.4~0.5GB per frame, 21 frames in total). It seems 80-100GB GPU memory to cost, which is hard to implement using the common GPU. So I'm wondering if you have any tricks to reduce the memory? Thanks a lot.

May 04 '24 20:05 yifliu3

same question

May 13 '24 15:05 pengc02

Dear devs,

Thanks for open-sourcing this great work! I am trying to implement the 3D reconstruction part in the paper, but I find problems when implementing sds loss. As sds loss is computed on the latents, I have to retain the grad of the vae encoder, which is extremly memory expensive (about 0.4~0.5GB per frame, 21 frames in total). It seems 80-100GB GPU memory to cost, which is hard to implement using the common GPU. So I'm wondering if you have any tricks to reduce the memory? Thanks a lot.

I have the same question. Do you have any solutions?

And I'm still confused about some details in SDS with SV3D. I think we should render 21 images of my 3D representation, add noise, and denoise them with SV3D. However, in the paper, it's written that "We sample a random camera". Is it possible to add noise and denoise on a single image? I believe a Temporal Attention trained on 21 frames won't work well on fewer frames(like 4-5 frames). So, do you have any tricks? Thanks.

Jun 13 '24 13:06 fengq1a0