Some details about the implementations
Hi! Thanks for your extraordinary work! I have a question about the training. The paper mentioned that "We introduce a Lightning T2I branch alongside the regular diffusion branch." But in the method section, all the loss calculation are conducted in the lighting T2I branch. I wonder to know, in the training, is it enough to only need a lighting model (SDXL Lighting) instead of using the original base model (SDXL)? I mean the original base model does not need any trainable parameters. In this way, where does the "alongside" reflect?
In Stage 1, we trained on the non-accelerated model (sdxl-base) using traditional diffusion loss. For Stage 2 and 3, although in the first implementation all losses were calculated on the lightning branch, we believe that it is also okay to place the diffusion loss on the non-accelerated model, and it will also help with compatibility.
Thanks for your quick responses!
By the way? How many timestpes do you used for flux-dev PuLID training (when calculating the ID loss) ?
I am training pulid like non-acclerated model for diffusion loss and acclerated model for id loss, and a weird problem arises: the generated image includes many faces of the same person. how can i resolve it? thx
Hi @garychan22, could you let me know your training dataset? Do you have an OS implementation of your training pipe?
In Stage 1, we trained on the non-accelerated model (sdxl-base) using traditional diffusion loss. For Stage 2 and 3, although in the first implementation all losses were calculated on the lightning branch, we believe that it is also okay to place the diffusion loss on the non-accelerated model, and it will also help with compatibility.
@ToTheBeginning
In stage2, is the diffusion loss trained in lightning branch?
Specificly, I mean, when adding noise to latents, is the
timesteps = torch.randint(0, pipe.scheduler.num_train_timesteps, (bsz,), device=latents.device)
noise_latent = pipe.scheduler.add_noise(latents, noise, timesteps),
then the noise_latent is denoised by lightning branch with 4 timesteps?
but in this way, the timesteps random selected is ranged from 0 to 999, will the 4-step lightning branch denoising still work? is the timesteps = torch.tensor([999] * bsz) so that the 4-step lightning branch can denoise properly to calculate the diffusion loss? meanwhile decode with vae and generate the images to calculate id loss?
@ToTheBeginning @guozinan126 @tsinggggg @zsxkib
besides, does the negative prompt participate in training
I am training pulid like non-acclerated model for diffusion loss and acclerated model for id loss, and a weird problem arises: the generated image includes many faces of the same person. how can i resolve it? thx
hi ,you work is going on?I don't achieve the stage 2,the face id is not same.can i talk about with you?