PuLID Some details about the implementations

Hi! Thanks for your extraordinary work! I have a question about the training. The paper mentioned that "We introduce a Lightning T2I branch alongside the regular diffusion branch." But in the method section, all the loss calculation are conducted in the lighting T2I branch. I wonder to know, in the training, is it enough to only need a lighting model (SDXL Lighting) instead of using the original base model (SDXL)? I mean the original base model does not need any trainable parameters. In this way, where does the "alongside" reflect?

Nov 04 '24 09:11 jy0205

In Stage 1, we trained on the non-accelerated model (sdxl-base) using traditional diffusion loss. For Stage 2 and 3, although in the first implementation all losses were calculated on the lightning branch, we believe that it is also okay to place the diffusion loss on the non-accelerated model, and it will also help with compatibility.

Nov 04 '24 10:11 ToTheBeginning

Thanks for your quick responses!

Nov 04 '24 13:11 jy0205

By the way? How many timestpes do you used for flux-dev PuLID training (when calculating the ID loss) ? WechatIMG24

Nov 04 '24 13:11 jy0205

I am training pulid like non-acclerated model for diffusion loss and acclerated model for id loss, and a weird problem arises: the generated image includes many faces of the same person. how can i resolve it? thx

Nov 27 '24 04:11 garychan22

Hi @garychan22, could you let me know your training dataset? Do you have an OS implementation of your training pipe?

Dec 19 '24 09:12 sborse3

In Stage 1, we trained on the non-accelerated model (sdxl-base) using traditional diffusion loss. For Stage 2 and 3, although in the first implementation all losses were calculated on the lightning branch, we believe that it is also okay to place the diffusion loss on the non-accelerated model, and it will also help with compatibility.

@ToTheBeginning

In stage2, is the diffusion loss trained in lightning branch? Specificly, I mean, when adding noise to latents, is the timesteps = torch.randint(0, pipe.scheduler.num_train_timesteps, (bsz,), device=latents.device)
noise_latent = pipe.scheduler.add_noise(latents, noise, timesteps), then the noise_latent is denoised by lightning branch with 4 timesteps?

but in this way, the timesteps random selected is ranged from 0 to 999, will the 4-step lightning branch denoising still work? is the timesteps = torch.tensor([999] * bsz) so that the 4-step lightning branch can denoise properly to calculate the diffusion loss? meanwhile decode with vae and generate the images to calculate id loss?

Feb 10 '25 03:02 digdigdigging

@ToTheBeginning @guozinan126 @tsinggggg @zsxkib

Feb 10 '25 03:02 digdigdigging

besides, does the negative prompt participate in training

Feb 10 '25 12:02 digdigdigging

I am training pulid like non-acclerated model for diffusion loss and acclerated model for id loss, and a weird problem arises: the generated image includes many faces of the same person. how can i resolve it? thx

hi ，you work is going on?I don't achieve the stage 2,the face id is not same.can i talk about with you?

Feb 27 '25 14:02 LeonNerd