ColossalAI [BUG]: Stable diffusion output a random noise

WX20230111-183123@2x I tried to train the teyvat data using: python main.py --logdir /tmp/ -t -b configs/Teyvat/train_colossalai_teyvat.yaml. Then i got a ckpt file of 9.7g. But when i use python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --outdir ./output
--config path/to/logdir/checkpoints/last.ckpt
--ckpt /path/to/logdir/configs/project.yaml
I got a picture with random noise. (I just follow the step of this readme file:https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion)

I think you yourself didn't try yout code, because this code of you is obviously wrong:python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --outdir ./output
--config path/to/logdir/checkpoints/last.ckpt
--ckpt /path/to/logdir/configs/project.yaml
the config and ckpt is inversed.

So is it that the ckpt file created by colossal is not right or any other problem?

Jan 11 '23 03:01 yufengyao-lingoace

Hi @yufengyao-lingoace

Have you tried the latest ColossalAI? You can install the newest version from source. As for your problem, I'm trying to repeat this bug and fix it.

Jan 12 '23 02:01 1SAA

@1SAA Hi, I tried both pip install colossalai==0.1.12+torch1.12cu11.3 -f https://release.colossalai.org and install from source. But got the same result. The train process seems to have no probem: Epoch 0: 80%|███▉ | 130/163 [13:22<03:23, 6.17s/it, loss=0.425, v_num=0, train/loss_simple_step=0.334, train/loss_vlb_step=0.334, train/loss_step=0.334, global_step=129.0, lr_abs=1.6e-7]

I also find a problem that after I trained a last.ckpt and got a loss of 0.4, I stop the train and re-train use the last.ckpt, then the loss of the first epoch is 1.0 but not 0.4. I don't know whether this information is useful to you.

Jan 12 '23 03:01 yufengyao-lingoace

Hi @yufengyao-lingoace

yeah. I got the same random noise. I reckon it's a problem with model saving. I'm trying to figure it out.

Jan 12 '23 03:01 1SAA

@1SAA ok, thanks very much, waiting for your fix!

Jan 12 '23 03:01 yufengyao-lingoace

Hi @yufengyao-lingoace

The truth is that the pre-trained weight is not loaded. Since Stability AI updated the model, Stable Diffusion, and changed the structure of the model, our old model can not load the latest pretrained weight. Thus, the finetune example is broken now. 😭 You could try the DeamBooth example instead. It requires some time to fix the diffuser example. 😢

Jan 12 '23 09:01 1SAA

@1SAA Thanks for your reply. Is it means that without the pre-trained model, we can not get a good output after a few epoch? I tried to train with 2000 pictures with 60 epoches, but I still get the picture with random noise. Is it because the 60 epoch is not enough for outputing a good picture?

Jan 12 '23 11:01 yufengyao-lingoace

I tried dreambooth，found random noise. So I tried to save the model in the first epoch eveb without been update. Then i used inference.py to generate picture，and still get wrong results. commands:python inference_qll.py --prompt "a girl with red hat" --modelpath "/data/qll/ColossalAI_3/ColossalAI/outputs/checkpoint-0" in train_dreambooth_colossalai.py: torch.cuda.synchronize() for epoch in range(args.num_train_epochs): unet.train() for step, batch in enumerate(train_dataloader): if gpc.get_local_rank(ParallelMode.DATA) == 0 and False: logger.info(f"before saving and pipline", ranks=[0]) pipeline = DiffusionPipeline.from_pretrained( args.pretrained_model_name_or_path, unet=convert_to_torch_module(unet), revision=args.revision, ) logger.info(f"after pipline, before saving", ranks=[0]) save_path = os.path.join(args.output_dir, f"checkpoint-{global_step}") pipeline.save_pretrained(save_path) logger.info(f"Saving model checkpoint to {save_path}", ranks=[0]) ....... in inference.py: opt = parser.parse_args() # seed_everything(opt.seed)

model_id = opt.modelpath
print(f"Loading model... from{model_id}")

pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

pipe.safety_checker = lambda images, **kwargs: (images, False)

prompt = opt.prompt  # "A photo of an apple."
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]

image.save("output.png")

Feb 01 '23 09:02 QiuLL

Is this a useless project?

Feb 13 '23 04:02 EricZgw

Is this a useless project?

It have been fixed in https://github.com/hpcaitech/ColossalAI/pull/2561

Feb 13 '23 05:02 Fazziekey

Nosie problem by colossal veriosion have been fixed in https://github.com/hpcaitech/ColossalAI/pull/2561

Feb 13 '23 05:02 Fazziekey