[Bug] checkpointing with momentum leads to parameters inconsistency in BLIP

Open yingfhu opened this issue 3 years ago • 1 comments

I found that during blip coco retrieval finetuning, the vit_grad_ckpt is True which means checkpointing is used in VIT and weakref is used in replaced forward function https://github.com/salesforce/LAVIS/blob/main/lavis/models/vit.py#L152

However, in the meantime, momentum encoder is created through deepcopy which copies the replaced forward function and its weak reference as well.

Therefore, after self._momentum_update(), visual_encoder_m.params are updated but weak reference inside the checkpointing layers is not and still refers to visual_encoder.params. That causes the parameters inconsistency in those checkpointing layers.

So the checkpointing layers in visual_encoder_m always use the params in visual_encoder in every forward.

Is this an observed behavior that you choose carefully to improve the metrics, or just a bug or it doesn't affect the metrics.

Feb 09 '23 05:02 yingfhu

Thanks for outlining the issue.

No, this is not the expect behavior, though the current implementation in LAVIS gives close (almost identical) results to the original.

I'll investigate this issue and try to resolve.

Feb 09 '23 06:02 dxli94