A question about training
[Epoch 20/2558] [Batch 750/782] [D loss: 1.229028] [G loss: -36.288643] [ema: 0.999577]
100%|██████████████████████████████████████████████████████████| 782/782 [04:56<00:00, 2.64it/s]
100%|██████████████████████████████████████████████████████████| 782/782 [04:47<00:00, 2.72it/s]
INFO:functions:=> calculate inception score
=> calculate inception score
Inception score: 0
=> calculate fid score
0%| | 0/6250 [00:00<?, ?it/s]
I return to train the experiment on 23090, but it always reponsed this issue for several hours. And 23090 are running on the half efficiency. I want to know whether it is appropriate? Thank you for your contribition and it gives me much help.
Sorry, I can not understand the description. Do you mean your program is stuck for several hours?
Yes, The code seems to be stuck (=> calculate fid score 0%| | 0/6250 [00:00<?, ?it/s]). But the GPU and CPU are used. Is it due to that 2 * 3090 is not enough to calculate FID ?
I would suggest disable calculating fid score during training program and launch a separate jobs for evaluation only.