Loss / metric curves
Could the authors or someone else who has successfully reproduced the training of MotionGPT share your train loss and R_TOP_3 (or R_TOP_1/R_TOP_2) curves so I can see if my training is going as it should be? After about 20 epochs, R_TOP_{1/2/3} are basically flat (R_TOP_3 < 0.1), though the training loss is decreasing.
I am facing the same problem.
I found this is due to the multi-GPU during training.
Change
https://github.com/OpenMotionLab/MotionGPT/blob/fac297260a0e7138ce04f8b41c2e7b24e1f09a9a/mGPT/metrics/t2m.py#L76
to
self.add_state("recmotion_embeddings", default=[], dist_reduce_fx="cat")
and
https://github.com/OpenMotionLab/MotionGPT/blob/fac297260a0e7138ce04f8b41c2e7b24e1f09a9a/mGPT/metrics/t2m.py#L134-L135
to
all_genmotions = self.recmotion_embeddings.cpu()[shuffle_idx, :]
Do the same thing to text_embeddings and gtmotion_embeddings, then the problem can be solved.
I found this is due to the multi-GPU during training. Change
https://github.com/OpenMotionLab/MotionGPT/blob/fac297260a0e7138ce04f8b41c2e7b24e1f09a9a/mGPT/metrics/t2m.py#L76
to
self.add_state("recmotion_embeddings", default=[], dist_reduce_fx="cat")and https://github.com/OpenMotionLab/MotionGPT/blob/fac297260a0e7138ce04f8b41c2e7b24e1f09a9a/mGPT/metrics/t2m.py#L134-L135to
all_genmotions = self.recmotion_embeddings.cpu()[shuffle_idx, :]Do the same thing totext_embeddingsandgtmotion_embeddings, then the problem can be solved.
hello,I had the same problem. could you point out which other lines need to make similar changes? Because I still have similar problems after I modified it.
@lixiang927047
# Chached batches
self.add_state("text_embeddings", default=[], dist_reduce_fx="cat")
self.add_state("recmotion_embeddings", default=[], dist_reduce_fx="cat")
self.add_state("gtmotion_embeddings", default=[], dist_reduce_fx="cat")
and
if type(self.recmotion_embeddings) == list:
all_genmotions = torch.cat(self.recmotion_embeddings,
axis=0).cpu()[shuffle_idx, :]
else:
all_genmotions = self.recmotion_embeddings.cpu()[shuffle_idx, :]
if type(self.gtmotion_embeddings) == list:
all_gtmotions = torch.cat(self.gtmotion_embeddings,
axis=0).cpu()[shuffle_idx, :]
else:
all_gtmotions = self.gtmotion_embeddings.cpu()[shuffle_idx, :]
# Compute text related metrics
if self.text:
if type(self.text_embeddings) == list:
all_texts = torch.cat(self.text_embeddings, axis=0).cpu()[shuffle_idx, :]
else:
all_texts = self.text_embeddings.cpu()[shuffle_idx, :]
# Chached batches self.add_state("text_embeddings", default=[], dist_reduce_fx="cat") self.add_state("recmotion_embeddings", default=[], dist_reduce_fx="cat") self.add_state("gtmotion_embeddings", default=[], dist_reduce_fx="cat")and
if type(self.recmotion_embeddings) == list: all_genmotions = torch.cat(self.recmotion_embeddings, axis=0).cpu()[shuffle_idx, :] else: all_genmotions = self.recmotion_embeddings.cpu()[shuffle_idx, :] if type(self.gtmotion_embeddings) == list: all_gtmotions = torch.cat(self.gtmotion_embeddings, axis=0).cpu()[shuffle_idx, :] else: all_gtmotions = self.gtmotion_embeddings.cpu()[shuffle_idx, :] # Compute text related metrics if self.text: if type(self.text_embeddings) == list: all_texts = torch.cat(self.text_embeddings, axis=0).cpu()[shuffle_idx, :] else: all_texts = self.text_embeddings.cpu()[shuffle_idx, :]
Hello, could you share the final training loss and metrics? I am also curious about whether my training goes well.
Many thanks for your help in advance.