[QUESTION/HELP] about ignore_unused_parameters hang
Hello, I would like to ask for assistance in solving a problem I've encountered.
I am currently training a MLLM with DeepSpeed, and I've introduced an additional modality to the existing ones. However, not all data in the batch include this new modality. Below is the code I'm currently using to manage this situation:
modality_images_filtered = [modality_image for modality_image in modality_images if modality_image is not None]
if len(modality_images_filtered) > 0:
concat_modality_images = torch.cat(modality_images_filtered, dim=0)
modality_image_features = self.modality_encode_images(concat_modality_images, prompts, modality_image_counts)
else:
# all None here, modality_image_features: [Tensor]
modality_image_features = modality_images
The logic is designed to allow only non-None objects to proceed with training. The trainable parameters in self.modality_encode_images include the projection and qformer corresponding to the modality.
The current issue is that when a batch contains data for this modality, training proceeds normally, and the parameters inself.modality_encode_images can be updated with gradients as expected. However, if all instances of this modality within a batch are None, it necessitates skipping this module. This leads to the parameters of this part of the module being unable to update, subsequently causing the training to hang.
I am currently using DeepSpeed versions 0.12.3 (and 0.14.0) with the ZeRO2 and the Hugging Face Transformers trainer. After consulting the documentation, I tried both setting the DeepSpeed parameter "ignore_unused_parameters": true and using the Transformers trainer flag --ddp_find_unused_parameters True, but neither approach resolved the hanging issue.
I'm wondering if the setup might be incorrect, or if there's an inherent flaw in my code logic. Any advice and help on this matter would be greatly appreciated.
same question +1