TransGAN icon indicating copy to clipboard operation
TransGAN copied to clipboard

A question about training

Open Jamie-Cheung opened this issue 3 years ago • 3 comments

[Epoch 20/2558] [Batch 750/782] [D loss: 1.229028] [G loss: -36.288643] [ema: 0.999577]
100%|██████████████████████████████████████████████████████████| 782/782 [04:56<00:00, 2.64it/s] 100%|██████████████████████████████████████████████████████████| 782/782 [04:47<00:00, 2.72it/s] INFO:functions:=> calculate inception score => calculate inception score Inception score: 0 => calculate fid score 0%| | 0/6250 [00:00<?, ?it/s] I return to train the experiment on 23090, but it always reponsed this issue for several hours. And 23090 are running on the half efficiency. I want to know whether it is appropriate? Thank you for your contribition and it gives me much help.

Jamie-Cheung avatar Mar 13 '22 11:03 Jamie-Cheung

Sorry, I can not understand the description. Do you mean your program is stuck for several hours?

yifanjiang19 avatar Mar 15 '22 01:03 yifanjiang19

Yes, The code seems to be stuck (=> calculate fid score 0%| | 0/6250 [00:00<?, ?it/s]). But the GPU and CPU are used. Is it due to that 2 * 3090 is not enough to calculate FID ?

Jamie-Cheung avatar Mar 16 '22 05:03 Jamie-Cheung

I would suggest disable calculating fid score during training program and launch a separate jobs for evaluation only.

yifanjiang19 avatar Mar 20 '22 20:03 yifanjiang19