Dao Minh Quan
Dao Minh Quan
Instead of using pretrained vq-4 from latent repo, I used the KL-8 pretrained from stable diffusion and managed to reproduce the result (you could see the DiT repo for it)...
Yeah, I used fixed learning rate 5e-5 and the Unet architecture is pretty the same, the only difference here is KL-8 downsample from 256 to 32 (instead of vq-4 from...
I met the same problem while running SER-FQA, but on ubuntu. I'm not sure if it is the same on Window. The way I solved that problem is to install...
If you are using DiT architecture, you should install torch>=2.0, the flash attention will allow you train faster but will sacrifice some performance. Or you could run encoder on the...
Hi, I'm understanding that you retrain our model and get 9.21. Is it correct ?
Please note that: our stat file is computed using jpg images. If the generated image is png image, it leads to very high fid.
I trained the model for 600 epochs and evaluate at 475 for CelebHQ-256
Yes, the model seems unstable after 500 epoch. In our paper, we use Cosine Learning rate decay and it depends on the total epoch. To be more stable, we suggest...
Yes, you should use DiT/train.py to revise my code. I found it is more easier and compact when following DiT repo.
Yes, I think you run it correctly, I wonder what environment you use to run model. I found that the architecture is more stable with torch 1.x version. I retrained...