mvdfusion icon indicating copy to clipboard operation
mvdfusion copied to clipboard

Question about the training parameters

Open ug-kim opened this issue 1 year ago • 1 comments

Hi, thank you for sharing this great work.

I have a question about training parameters.

For the depth generation, the authors expand UNet input and output channels. For the depth-guided multi-view attention, the authors add an additional depth-guided attention module.

So, are the trainable parameters only for the depth-guided attention modules or zero123 pre-trained + depth-guided attention modules?

Thank you.

ug-kim avatar Nov 08 '24 14:11 ug-kim

Hello,

The trainable parameters are zero123 pre-trained + depth-guided attention modules because of the extra input and output channel required from the zero123 pre-trained weights. However we do another stage of finetuning where we freeze zero123 and only train the attention module so we can fit more images in the memory.

zhizdev avatar Nov 08 '24 15:11 zhizdev