stable-diffusion UnpicklingError from ffhq config file

I was following the instructions for training the FFHQ dataset.

And I've faced this error

(ldm) user99@fd5a72f3f***:/home/user99/stable-diffusion# CUDA_VISIBLE_DEVICES=0 python main.py --base configs/latent-diffusion/ffhq-ldm-vq-4.yaml -t --gpus 0
Global seed set to 23
Running on GPUs 0
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 274.06 M params.
Keeping EMAs of 370.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 3, 64, 64) = 12288 dimensions.
making attention of type 'vanilla' with 512 in_channels
Traceback (most recent call last):
  File "main.py", line 535, in <module>
    model = instantiate_from_config(config.model)
  File "/home/user99/stable-diffusion/ldm/util.py", line 85, in instantiate_from_config
    return get_obj_from_str(config["target"])(**config.get("params", dict()))
  File "/home/user99/stable-diffusion/ldm/models/diffusion/ddpm.py", line 460, in __init__
    self.instantiate_first_stage(first_stage_config)
  File "/home/user99/stable-diffusion/ldm/models/diffusion/ddpm.py", line 503, in instantiate_first_stage
    model = instantiate_from_config(config)
  File "/home/user99/stable-diffusion/ldm/util.py", line 85, in instantiate_from_config
    return get_obj_from_str(config["target"])(**config.get("params", dict()))
  File "/home/user99/stable-diffusion/ldm/models/autoencoder.py", line 266, in __init__
    super().__init__(embed_dim=embed_dim, *args, **kwargs)
  File "/home/user99/stable-diffusion/ldm/models/autoencoder.py", line 59, in __init__
    self.init_from_ckpt(ckpt_path, ignore_keys=ignore_keys)
  File "/home/user99/stable-diffusion/ldm/models/autoencoder.py", line 79, in init_from_ckpt
    sd = torch.load(path, map_location="cpu")["state_dict"]
  File "/root/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/serialization.py", line 713, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/root/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/serialization.py", line 920, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'm'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 740, in <module>
    if trainer.global_rank == 0:
NameError: name 'trainer' is not defined

It seems like importing ffhq-ldm-vq-4.yaml caused the issue, I couldn't find the solution.

I have checked versions of dependencies, and Anaconda issues seem fine since I can run the txt2img script file.

Any clues for me to look for? Any suggestions would be greatly helpful.

Oct 23 '22 08:10 KyonP

Same error when I run

python3 main.py --train --base configs/latent-diffusion/lsun_bedrooms-ldm-vq-4_ss.yaml

Though I also had to modify configs/first_stage_models/vq-f4/model.yaml to be models/first_stage_models/vq-f4/config.yaml

Oct 24 '22 20:10 ssusie

Try to solve like this https://github.com/CompVis/stable-diffusion/issues/432

Oct 28 '22 13:10 aleff-github

I have solved this issue by re-downloading model files by running this script .

Maybe it occurred by a file fragmentation problem. thank you for your suggestions!

Oct 28 '22 17:10 KyonP

any solution? redownloading model by running script doesn't work for me.

Nov 03 '22 09:11 dongzhuoyao

any solution? redownloading model by running script doesn't work for me.

Try following this steps:

Go to this link https://huggingface.co/
Signup and login
Then go to this page https://huggingface.co/CompVis/stable-diffusion-v-1-4-original
Click that you agree the terms (if you agree the terms, else you can't download it)
Click to the link of sd-v1-4.ckpt (or click directly here)

I pushed this steps in the README.md file of this repository -> https://github.com/CompVis/stable-diffusion/pull/437

Nov 03 '22 12:11 aleff-github

Hi, could you please point me to how to prepared the FFHQ dataset? I followed the instruction and used PGGAN and obtain a folder of *.tfrecord files, which seems not the correct training data format. Besides, there seems no ldm.data.faceshq.FFHQTrain module in the code.

Feb 01 '23 00:02 sunshineatnoon