Difference Between Testing Results and Paper Results
Hello, I encountered a significant difference when reproducing the results for the t2m task, which is similarly reported in Issue 49.
Below is a comparison of the results:
| R_top1 | R_top2 | R_top3 | FID | MMDist | Diversity | MModality | |
|---|---|---|---|---|---|---|---|
| ground truth | 0.511 | 0.702 | 0.798 | 0.002 | 2.967 | 9.422 | - |
| paper result | 0.492 | 0.681 | 0.778 | 0.232 | 3.096 | 9.528 | 2.008 |
| my result (last.ckpt) | 0.322 | 0.467 | 0.554 | 0.491 | 4.738 | 9.231 | 4.673 |
| my result (mgpt.tar) | 0.402 | 0.568 | 0.659 | 0.185 | 4.019 | 9.294 | 3.501 |
First, I tried to train my own MotionGPT models and set the parameters in config_h3d_stage3.yaml as follows:
TRAIN:
PRETRAINED: 'experiments/mgpt/Pretrain_HumanML3D/checkpoints/last.ckpt' # Pretrained model path
PRETRAINED_VAE: experiments/mgpt/VQVAE_HumanML3D/checkpoints/last.ckpt # Vae model path
TEST:
CHECKPOINTS: experiments/mgpt/Instruct_HumanML3D/checkpoints/last.ckpt
The results differed significantly from those reported in the paper. Therefore, I attempted to use the checkpoint downloaded from Hugging Face and set the parameters in config_h3d_stage3.yaml as follows:
TRAIN:
PRETRAINED: 'checkpoints/MotionGPT-base/motiongpt_s3_h3d.tar' # Pretrained model path
PRETRAINED_VAE: experiments/mgpt/VQVAE_HumanML3D/checkpoints/last.ckpt # Vae model path
TEST:
CHECKPOINTS: checkpoints/MotionGPT-base/motiongpt_s3_h3d.tar
However, I was still unable to reproduce the results from the paper😭
The attachments are the testing logs: mgpt_tar.log , last_ckpt.log.
- To reproduce the results of the paper, do I need to adjust any other hyperparameters in
config_h3d_stage3.yaml? - If possible, could you provide more detailed model configurations, data processing steps, and testing scripts?
- Are there any other factors that might influence the testing results that I should be aware of?
Thank you very much for your assistance😊
Hello, I got similar results when I reproduced it. The checkpoint of R_top1 index is only about 0.4, and the model obtained by retraining based on T5 is only a little over 0.3. Have you found a solution?
Same issue. Looking for a solution.
The training log in your attachment seems to be inaccessible — could you please show it directly? In my second training stage with 300 epochs, the evaluation metrics fluctuated significantly, and there might be some errors.