ctm How to reach FID 1.92 on Imagenet 64

I downloaded the pretrained Imagenet 64 checkpoint and use the provided sampling commands (with slight modification to make it run on my machine)

export OMPI_COMM_WORLD_RANK=0
export OMPI_COMM_WORLD_LOCAL_RANK=0
export OMPI_COMM_WORLD_SIZE=8

MODEL_FLAGS="--data_name=imagenet64 --class_cond=True --eval_interval=1000 --save_interval=1000 --num_classes=1000 --eval_batch=250 --eval_fid=True --eval_similarity=False --check_dm_performance=False --log_interval=100"

# CUDA_VISIBLE_DEVICES=0 mpiexec -n 1 python ./code/image_sample.py $MODEL_FLAGS --class_cond=True --num_classes=1000 --out_dir ./ctm-sample-paths/ctm_bs_1440/ --model_path=./ctm-runs/ctm_bs_1440/ema_0.999_006000.pt --training_mode=ctm --class_cond=True --eval_num_samples=6400 --batch_size=800 --device_id=0 --stochastic_seed=True --save_format=npz --ind_1=36 --ind_2=20 --use_MPI=True --sampler=exact --sampling_steps=1

CUDA_VISIBLE_DEVICES=0 mpiexec -n 1 python ./code/image_sample.py $MODEL_FLAGS --class_cond=True --num_classes=1000 --out_dir ./ctm-sample-paths/ctm_bs_1440_author/ --model_path=./ckpts/ema_0.999_049000.pt --training_mode=ctm --class_cond=True --eval_num_samples=6400 --batch_size=800 --device_id=0 --stochastic_seed=True --save_format=npz --ind_1=36 --ind_2=20 --use_MPI=True --sampler=exact --sampling_steps=1

And I tested it according to the provided instrutions.

CUDA_VISIBLE_DEVICES=0 python code/evaluations/evaluator.py     ref-statistics/VIRTUAL_imagenet64_labeled.npz     ctm-sample-paths/ctm_bs_1440_author/ctm_exact_sampler_1_steps_049000_itrs_0.999_ema_/

But the performance reading is significantly worse.

Inception Score: 68.49456024169922FID: 6.839029518808786
sFID: 22.465793478419187
Precision: 0.7965625
Recall: 0.6551

How exactly could I read FID 1.92 ? Even with pretrained model directly from the author?

Jul 10 '24 21:07 Schwartz-Zha

I got similar results although I tried with QKVAttentionLegacy and XformersAttention and suspect that the issues raised here could be hurting results? Which form of attention did you use?

Dec 02 '24 02:12 ehedlin

I also realised that the inference code doesnt seem to use rejection sampling as shown in the paper. This line seems to show that rejection sampling was ran at a ratio of 10,000 except the code referenced doesn't seem to exist in this repo. There is also this file with no documentation, however it seems the most promising.

Dec 02 '24 04:12 ehedlin

I am on a newer version of NVIDIA GPU, i.e. L40S, so no worry on the hardware support of Attention and xformers. Did you manage to merge the rejection sampling into the evaluation pipeline?

Dec 02 '24 09:12 Schwartz-Zha

I ran the classifier rejection code but it seemed to produce similar results so I emailed the authors to ask about the difference in performance.

Dec 02 '24 19:12 ehedlin

Sorry I am a newbie here, do you mind pasting your command to run code/classifier_rejection.py here? I literally have no idea how to do that. CTM claimed to use this sampling strategy to achieve a better result, I am confused how to do that.

Dec 03 '24 09:12 Schwartz-Zha

I was able to get the published performance by running --eval_num_samples=50000 when generating samples (default is 6400). Im assuming that's what was intended as that number seems to be hard coded into the evaluation script. Put another way, the quality of the generated samples dont seem to be the problem, just that the FID is inflated when matching against fewer samples.

Dec 04 '24 03:12 ehedlin

Ah, I see. That makes perfect sense.

By the way, do you still need classifier rejection sampling after changing to 50000 samples?

Dec 04 '24 07:12 Schwartz-Zha

No, I just used code/image_sample.py and code/evaluations/evaluator.py.

Dec 05 '24 03:12 ehedlin