MotionGPT Difference Between Testing Results and Paper Results

Hello, I encountered a significant difference when reproducing the results for the t2m task, which is similarly reported in Issue 49.

Below is a comparison of the results:

	R_top1	R_top2	R_top3	FID	MMDist	Diversity	MModality
ground truth	0.511	0.702	0.798	0.002	2.967	9.422	-
paper result	0.492	0.681	0.778	0.232	3.096	9.528	2.008
my result (last.ckpt)	0.322	0.467	0.554	0.491	4.738	9.231	4.673
my result (mgpt.tar)	0.402	0.568	0.659	0.185	4.019	9.294	3.501

First, I tried to train my own MotionGPT models and set the parameters in config_h3d_stage3.yaml as follows:

TRAIN:
  PRETRAINED: 'experiments/mgpt/Pretrain_HumanML3D/checkpoints/last.ckpt' # Pretrained model path
  PRETRAINED_VAE: experiments/mgpt/VQVAE_HumanML3D/checkpoints/last.ckpt # Vae model path

TEST:
  CHECKPOINTS: experiments/mgpt/Instruct_HumanML3D/checkpoints/last.ckpt

The results differed significantly from those reported in the paper. Therefore, I attempted to use the checkpoint downloaded from Hugging Face and set the parameters in config_h3d_stage3.yaml as follows:

TRAIN:
  PRETRAINED: 'checkpoints/MotionGPT-base/motiongpt_s3_h3d.tar' # Pretrained model path
  PRETRAINED_VAE: experiments/mgpt/VQVAE_HumanML3D/checkpoints/last.ckpt # Vae model path

TEST:
  CHECKPOINTS: checkpoints/MotionGPT-base/motiongpt_s3_h3d.tar

However, I was still unable to reproduce the results from the paper😭

The attachments are the testing logs: mgpt_tar.log , last_ckpt.log.

To reproduce the results of the paper, do I need to adjust any other hyperparameters in config_h3d_stage3.yaml ?
If possible, could you provide more detailed model configurations, data processing steps, and testing scripts?
Are there any other factors that might influence the testing results that I should be aware of?

Thank you very much for your assistance😊

Jun 05 '24 11:06 Iris1946

Hello, I got similar results when I reproduced it. The checkpoint of R_top1 index is only about 0.4, and the model obtained by retraining based on T5 is only a little over 0.3. Have you found a solution?

Jul 30 '24 11:07 www-Ye

Same issue. Looking for a solution.

Mar 20 '25 04:03 Lyman-Smoker

The training log in your attachment seems to be inaccessible — could you please show it directly? In my second training stage with 300 epochs, the evaluation metrics fluctuated significantly, and there might be some errors.

Apr 12 '25 09:04 Seven2455