MotionGPT icon indicating copy to clipboard operation
MotionGPT copied to clipboard

Loss / metric curves

Open sanjayss34 opened this issue 2 years ago • 5 comments

Could the authors or someone else who has successfully reproduced the training of MotionGPT share your train loss and R_TOP_3 (or R_TOP_1/R_TOP_2) curves so I can see if my training is going as it should be? After about 20 epochs, R_TOP_{1/2/3} are basically flat (R_TOP_3 < 0.1), though the training loss is decreasing.

sanjayss34 avatar Nov 26 '23 17:11 sanjayss34

I am facing the same problem.

YU1ut avatar Nov 28 '23 13:11 YU1ut

I found this is due to the multi-GPU during training. Change https://github.com/OpenMotionLab/MotionGPT/blob/fac297260a0e7138ce04f8b41c2e7b24e1f09a9a/mGPT/metrics/t2m.py#L76 to self.add_state("recmotion_embeddings", default=[], dist_reduce_fx="cat") and https://github.com/OpenMotionLab/MotionGPT/blob/fac297260a0e7138ce04f8b41c2e7b24e1f09a9a/mGPT/metrics/t2m.py#L134-L135 to all_genmotions = self.recmotion_embeddings.cpu()[shuffle_idx, :] Do the same thing to text_embeddings and gtmotion_embeddings, then the problem can be solved.

YU1ut avatar Feb 16 '24 07:02 YU1ut

I found this is due to the multi-GPU during training. Change

https://github.com/OpenMotionLab/MotionGPT/blob/fac297260a0e7138ce04f8b41c2e7b24e1f09a9a/mGPT/metrics/t2m.py#L76

to self.add_state("recmotion_embeddings", default=[], dist_reduce_fx="cat") and https://github.com/OpenMotionLab/MotionGPT/blob/fac297260a0e7138ce04f8b41c2e7b24e1f09a9a/mGPT/metrics/t2m.py#L134-L135

to all_genmotions = self.recmotion_embeddings.cpu()[shuffle_idx, :] Do the same thing to text_embeddings and gtmotion_embeddings, then the problem can be solved.

hello,I had the same problem. could you point out which other lines need to make similar changes? Because I still have similar problems after I modified it.

Coco-XiangLi avatar Feb 20 '24 16:02 Coco-XiangLi

@lixiang927047

        # Chached batches
        self.add_state("text_embeddings", default=[], dist_reduce_fx="cat")
        self.add_state("recmotion_embeddings", default=[], dist_reduce_fx="cat")
        self.add_state("gtmotion_embeddings", default=[], dist_reduce_fx="cat")

and

        if type(self.recmotion_embeddings) == list:
            all_genmotions = torch.cat(self.recmotion_embeddings,
                                   axis=0).cpu()[shuffle_idx, :]
        else:
            all_genmotions = self.recmotion_embeddings.cpu()[shuffle_idx, :]

        if type(self.gtmotion_embeddings) == list:
            all_gtmotions = torch.cat(self.gtmotion_embeddings,
                                    axis=0).cpu()[shuffle_idx, :]
        else:
            all_gtmotions = self.gtmotion_embeddings.cpu()[shuffle_idx, :]

        # Compute text related metrics
        if self.text:
            if type(self.text_embeddings) == list:
                all_texts = torch.cat(self.text_embeddings, axis=0).cpu()[shuffle_idx, :]
            else:
                all_texts = self.text_embeddings.cpu()[shuffle_idx, :]

YU1ut avatar Feb 21 '24 05:02 YU1ut

@lixiang927047

        # Chached batches
        self.add_state("text_embeddings", default=[], dist_reduce_fx="cat")
        self.add_state("recmotion_embeddings", default=[], dist_reduce_fx="cat")
        self.add_state("gtmotion_embeddings", default=[], dist_reduce_fx="cat")

and

        if type(self.recmotion_embeddings) == list:
            all_genmotions = torch.cat(self.recmotion_embeddings,
                                   axis=0).cpu()[shuffle_idx, :]
        else:
            all_genmotions = self.recmotion_embeddings.cpu()[shuffle_idx, :]

        if type(self.gtmotion_embeddings) == list:
            all_gtmotions = torch.cat(self.gtmotion_embeddings,
                                    axis=0).cpu()[shuffle_idx, :]
        else:
            all_gtmotions = self.gtmotion_embeddings.cpu()[shuffle_idx, :]

        # Compute text related metrics
        if self.text:
            if type(self.text_embeddings) == list:
                all_texts = torch.cat(self.text_embeddings, axis=0).cpu()[shuffle_idx, :]
            else:
                all_texts = self.text_embeddings.cpu()[shuffle_idx, :]

Hello, could you share the final training loss and metrics? I am also curious about whether my training goes well.

Many thanks for your help in advance.

Lyman-Smoker avatar Mar 19 '25 03:03 Lyman-Smoker