latent-diffusion When to Stop the Training for Unconditional Training on FFHQ Dataset

I am training from scratch on all FFHQ dataset (70k), with base learning rate as 1.0e-06, and I use the scale_lr=True parameter.

But the training process seems very oscillating. With the smoothing as 0.99, I can see the loss descending. Is that correct?
And when can I stop the training. I found that even in after 300 epochs, there are still some artifacts in sampled data.

Here is the training process in training dataset.

Here is the sampled data. The teeth of the 1st person seems unnormal.

Apr 18 '23 03:04 Nyquist0

Hi, have you caculated the FID score?

Apr 18 '23 09:04 ader47

Hi, have you caculated the FID score?

Opps, good catch! I will do that. Thanks.

Apr 19 '23 01:04 Nyquist0

Hi, have you caculated the FID score?

With base lr=1e-6, batch size=42, scale_lr=True, DDIM step=200, I got 18.36, while the official is 4.98.

It is still a gap here. But I already trained 420 epochs on FFHQ 70k data.

Apr 19 '23 11:04 Nyquist0

Hi, have you caculated the FID score?

Opps, good catch! I will do that. Thanks.

Hi, have you caculated the FID score?

With base lr=1e-6, batch size=42, scale_lr=True, DDIM step=200, I got 18.36, while the official is 4.98.

It is still a gap here. But I already trained 420 epochs on FFHQ 70k data.

I trained 400 epoch on LSUN_Churches, with provided config file, but the FID was about 14. while the FID reported is 4.02. I can not reproduce the result, too.

Apr 19 '23 11:04 ader47

I trained 400 epoch on LSUN_Churches, with provided config file, but the FID was about 14. while the FID reported is 4.02. I can not reproduce the result, too.

Thanks for sharing. I am assuming that there might be some other tricks.. May I know how long (time and epochs) did that converge in LSUN?

Apr 20 '23 02:04 Nyquist0

I trained 400 epoch on LSUN_Churches, with provided config file, but the FID was about 14. while the FID reported is 4.02. I can not reproduce the result, too.

Thanks for sharing. I am assuming that there might be some other tricks.. May I know how long (time and epochs) did that converge in LSUN?

On LSUN_Churches, loss converge only use several epochs (1-3), but the images are bad, I have to train about 140 epoch, the FID can be about 14. and no more decrease. About thirty minutes per epoch on 2 RTX3090.

Apr 20 '23 12:04 ader47

Thanks. That helps a lot.

Apr 25 '23 02:04 Nyquist0

Instead of using pretrained vq-4 from latent repo, I used the KL-8 pretrained from stable diffusion and managed to reproduce the result (you could see the DiT repo for it) by training from the scatch.

May 08 '23 15:05 quandao10

Instead of using pretrained vq-4 from latent repo, I used the KL-8 pretrained from stable diffusion and managed to reproduce the result (you could see the DiT repo for it) by training from the scatch.

Thank you so much! I‘ll try it!

May 08 '23 15:05 ader47

Instead of using pretrained vq-4 from latent repo, I used the KL-8 pretrained from stable diffusion and managed to reproduce the result (you could see the DiT repo for it) by training from the scatch.

Have you changed the parameters of the Unet or the learning rate?

May 08 '23 15:05 ader47

Yeah, I used fixed learning rate 5e-5 and the Unet architecture is pretty the same, the only difference here is KL-8 downsample from 256 to 32 (instead of vq-4 from 256 to 64), then you could use smaller architecture Unet. Note, I used the pretrained from stable diffusion, not from this repo.

May 08 '23 15:05 quandao10

Yeah, I used fixed learning rate 5e-5 and the Unet architecture is pretty the same, the only difference here is KL-8 downsample from 256 to 32 (instead of vq-4 from 256 to 64), then you could use smaller architecture Unet. Note, I used the pretrained from stable diffusion, not from this repo.

Thanks! That helps a lot!! 🥹🥹

May 08 '23 16:05 ader47

Hi,@quandao10 which dataset have you reproduced? I found that the VAE used in stable diffusion is same with the VAE used in the lsun_church dataset.

May 09 '23 07:05 ader47

Hi @ader47 What's your batch size and memory used? I met a out of memory issue which forced me to set a lower batch size.

May 19 '23 03:05 GYukai

Hi @ader47 What's your batch size and memory used? I met a out of memory issue which forced me to set a lower batch size.

I used RTX3090 24GB, batch size was about 50 for 4x32x32 latent size, and 24 for 3x64x64. I have to use 2 cards to set the same batch size with the original paper.

May 19 '23 06:05 ader47

2. artifacts

I trained 400 epoch on LSUN_Churches, with provided config file, but the FID was about 14. while the FID reported is 4.02. I can not reproduce the result, too.

Thanks for sharing. I am assuming that there might be some other tricks.. May I know how long (time and epochs) did that converge in LSUN?

On LSUN_Churches, loss converge only use several epochs (1-3), but the images are bad, I have to train about 140 epoch, the FID can be about 14. and no more decrease. About thirty minutes per epoch on 2 RTX3090.

Hi， I am trying to reproduce the result on Lsun Churches. But I faced the problem that calculating the FID . could you please give some advice about the authoritative method of calculation FID? I am not sure whether the FID result right when I use some opensource.

Jun 05 '23 08:06 huangyehui

artifacts

I trained 400 epoch on LSUN_Churches, with provided config file, but the FID was about 14. while the FID reported is 4.02. I can not reproduce the result, too.

Thanks for sharing. I am assuming that there might be some other tricks.. May I know how long (time and epochs) did that converge in LSUN?

On LSUN_Churches, loss converge only use several epochs (1-3), but the images are bad, I have to train about 140 epoch, the FID can be about 14. and no more decrease. About thirty minutes per epoch on 2 RTX3090.

Hi， I am trying to reproduce the result on Lsun Churches. But I faced the problem that calculating the FID . could you please give some advice about the authoritative method of calculation FID? I am not sure whether the FID result right when I use some opensource.

The paper was calculated via the ‘torch-fidelity’

Jun 05 '23 08:06 ader47

torch-fidelity

Thanks a lot for the amazing quick reply :) . It very helpful to me

Jun 05 '23 08:06 huangyehui

artifacts

I trained 400 epoch on LSUN_Churches, with provided config file, but the FID was about 14. while the FID reported is 4.02. I can not reproduce the result, too.

Thanks for sharing. I am assuming that there might be some other tricks.. May I know how long (time and epochs) did that converge in LSUN?

On LSUN_Churches, loss converge only use several epochs (1-3), but the images are bad, I have to train about 140 epoch, the FID can be about 14. and no more decrease. About thirty minutes per epoch on 2 RTX3090.

Hi， I am trying to reproduce the result on Lsun Churches. But I faced the problem that calculating the FID . could you please give some advice about the authoritative method of calculation FID? I am not sure whether the FID result right when I use some opensource.

The paper was calculated via the ‘torch-fidelity’

大佬顺便问下，Lsun Churches这个最后FID你训练到多少了？有没有复现paper中的结果啊？

Jun 05 '23 08:06 huangyehui

3. artifacts

I trained 400 epoch on LSUN_Churches, with provided config file, but the FID was about 14. while the FID reported is 4.02. I can not reproduce the result, too.

Thanks for sharing. I am assuming that there might be some other tricks.. May I know how long (time and epochs) did that converge in LSUN?

On LSUN_Churches, loss converge only use several epochs (1-3), but the images are bad, I have to train about 140 epoch, the FID can be about 14. and no more decrease. About thirty minutes per epoch on 2 RTX3090.

Hi， I am trying to reproduce the result on Lsun Churches. But I faced the problem that calculating the FID . could you please give some advice about the authoritative method of calculation FID? I am not sure whether the FID result right when I use some opensource.

The paper was calculated via the ‘torch-fidelity’

大佬顺便问下，Lsun Churches这个最后FID你训练到多少了？有没有复现paper中的结果啊？

没有，我最低FID是11.几😥

Jun 05 '23 08:06 ader47

artifacts

I trained 400 epoch on LSUN_Churches, with provided config file, but the FID was about 14. while the FID reported is 4.02. I can not reproduce the result, too.

Thanks for sharing. I am assuming that there might be some other tricks.. May I know how long (time and epochs) did that converge in LSUN?

On LSUN_Churches, loss converge only use several epochs (1-3), but the images are bad, I have to train about 140 epoch, the FID can be about 14. and no more decrease. About thirty minutes per epoch on 2 RTX3090.

Hi， I am trying to reproduce the result on Lsun Churches. But I faced the problem that calculating the FID . could you please give some advice about the authoritative method of calculation FID? I am not sure whether the FID result right when I use some opensource.

The paper was calculated via the ‘torch-fidelity’

大佬顺便问下，Lsun Churches这个最后FID你训练到多少了？有没有复现paper中的结果啊？

没有，我最低FID是11.几disappointed_relieved

扎心了

Jun 05 '23 09:06 huangyehui

artifacts

I trained 400 epoch on LSUN_Churches, with provided config file, but the FID was about 14. while the FID reported is 4.02. I can not reproduce the result, too.

Thanks for sharing. I am assuming that there might be some other tricks.. May I know how long (time and epochs) did that converge in LSUN?

On LSUN_Churches, loss converge only use several epochs (1-3), but the images are bad, I have to train about 140 epoch, the FID can be about 14. and no more decrease. About thirty minutes per epoch on 2 RTX3090.

Hi， I am trying to reproduce the result on Lsun Churches. But I faced the problem that calculating the FID . could you please give some advice about the authoritative method of calculation FID? I am not sure whether the FID result right when I use some opensource.

The paper was calculated via the ‘torch-fidelity’

大佬顺便问下，Lsun Churches这个最后FID你训练到多少了？有没有复现paper中的结果啊？

没有，我最低FID是11.几😥

大佬们我能不能咨询一个问题，就是训练好autoencoder以后，如何在latent space 中进行diffusion模型的训练；比如说原图像x经过autoencoder的encoder后，即postrior=autoencoder.encoder(x)，z = postrior.sample()，那么这个z是不是就是图像x对应在latent space中的x0?在这个x0上加噪声得到xt，然后预测出噪声后计算损失。训练完成后，采样的时候就是用xT=torch.randn(latent space的形状)一步一步往回走直到x0，最后autoencoder.decoder(x0)获得采样图像。请问这样的流程是对的吗，希望能得到大佬们的回复，感谢！

Jul 31 '23 04:07 Zjz999hbq

artifacts

I trained 400 epoch on LSUN_Churches, with provided config file, but the FID was about 14. while the FID reported is 4.02. I can not reproduce the result, too.

Thanks for sharing. I am assuming that there might be some other tricks.. May I know how long (time and epochs) did that converge in LSUN?

On LSUN_Churches, loss converge only use several epochs (1-3), but the images are bad, I have to train about 140 epoch, the FID can be about 14. and no more decrease. About thirty minutes per epoch on 2 RTX3090.

Hi， I am trying to reproduce the result on Lsun Churches. But I faced the problem that calculating the FID . could you please give some advice about the authoritative method of calculation FID? I am not sure whether the FID result right when I use some opensource.

The paper was calculated via the ‘torch-fidelity’

I have stumbled upon this passage in the paper "We follow common practice and estimate the statistics for calculating the FID-, Precision- and Recall-scores [29,50] shown in Tab. 1 and 10 based on 50k samples from our models and the entire training set of each of the shown datasets."

does this mean they calculated the FID on the training dataset instead of validation set?

Sep 25 '23 22:09 pysaba

Hi @quandao10 Are sure that using Kl-8 instead of VQGAN-4 works for. As I tried it despite it does not make sense and as expected it didn't work and I got only 13 FID score.

Does anyone manage to reproduce the results in the paper?

Nov 05 '23 22:11 eslambakr

artifacts

I trained 400 epoch on LSUN_Churches, with provided config file, but the FID was about 14. while the FID reported is 4.02. I can not reproduce the result, too.

Thanks for sharing. I am assuming that there might be some other tricks.. May I know how long (time and epochs) did that converge in LSUN?

On LSUN_Churches, loss converge only use several epochs (1-3), but the images are bad, I have to train about 140 epoch, the FID can be about 14. and no more decrease. About thirty minutes per epoch on 2 RTX3090.

Hi， I am trying to reproduce the result on Lsun Churches. But I faced the problem that calculating the FID . could you please give some advice about the authoritative method of calculation FID? I am not sure whether the FID result right when I use some opensource.

The paper was calculated via the ‘torch-fidelity’

May I ask how long it takes to calculate a Fid? Why does it take a long time for me to test? It takes about 40 seconds to generate 8 (batch_size=8) images at once. Looking forward to your reply

Dec 12 '23 01:12 wj7486