MotionGPT icon indicating copy to clipboard operation
MotionGPT copied to clipboard

Difference Between Testing Results and Paper Results

Open Iris1946 opened this issue 1 year ago • 3 comments

Hello, I encountered a significant difference when reproducing the results for the t2m task, which is similarly reported in Issue 49.

Below is a comparison of the results:

R_top1 R_top2 R_top3 FID MMDist Diversity MModality
ground truth 0.511 0.702 0.798 0.002 2.967 9.422 -
paper result 0.492 0.681 0.778 0.232 3.096 9.528 2.008
my result (last.ckpt) 0.322 0.467 0.554 0.491 4.738 9.231 4.673
my result (mgpt.tar) 0.402 0.568 0.659 0.185 4.019 9.294 3.501

First, I tried to train my own MotionGPT models and set the parameters in config_h3d_stage3.yaml as follows:

TRAIN:
  PRETRAINED: 'experiments/mgpt/Pretrain_HumanML3D/checkpoints/last.ckpt' # Pretrained model path
  PRETRAINED_VAE: experiments/mgpt/VQVAE_HumanML3D/checkpoints/last.ckpt # Vae model path

TEST:
  CHECKPOINTS: experiments/mgpt/Instruct_HumanML3D/checkpoints/last.ckpt

The results differed significantly from those reported in the paper. Therefore, I attempted to use the checkpoint downloaded from Hugging Face and set the parameters in config_h3d_stage3.yaml as follows:

TRAIN:
  PRETRAINED: 'checkpoints/MotionGPT-base/motiongpt_s3_h3d.tar' # Pretrained model path
  PRETRAINED_VAE: experiments/mgpt/VQVAE_HumanML3D/checkpoints/last.ckpt # Vae model path

TEST:
  CHECKPOINTS: checkpoints/MotionGPT-base/motiongpt_s3_h3d.tar

However, I was still unable to reproduce the results from the paper😭

The attachments are the testing logs: mgpt_tar.log , last_ckpt.log.

  • To reproduce the results of the paper, do I need to adjust any other hyperparameters in config_h3d_stage3.yaml ?
  • If possible, could you provide more detailed model configurations, data processing steps, and testing scripts?
  • Are there any other factors that might influence the testing results that I should be aware of?

Thank you very much for your assistance😊

Iris1946 avatar Jun 05 '24 11:06 Iris1946

Hello, I got similar results when I reproduced it. The checkpoint of R_top1 index is only about 0.4, and the model obtained by retraining based on T5 is only a little over 0.3. Have you found a solution?

www-Ye avatar Jul 30 '24 11:07 www-Ye

Same issue. Looking for a solution.

Lyman-Smoker avatar Mar 20 '25 04:03 Lyman-Smoker

The training log in your attachment seems to be inaccessible — could you please show it directly? In my second training stage with 300 epochs, the evaluation metrics fluctuated significantly, and there might be some errors.

Seven2455 avatar Apr 12 '25 09:04 Seven2455