Question about the training parameters
Hi, thank you for sharing this great work.
I have a question about training parameters.
For the depth generation, the authors expand UNet input and output channels. For the depth-guided multi-view attention, the authors add an additional depth-guided attention module.
So, are the trainable parameters only for the depth-guided attention modules or zero123 pre-trained + depth-guided attention modules?
Thank you.
Hello,
The trainable parameters are zero123 pre-trained + depth-guided attention modules because of the extra input and output channel required from the zero123 pre-trained weights. However we do another stage of finetuning where we freeze zero123 and only train the attention module so we can fit more images in the memory.