Loading PT Files for VAE with the AutoEncoder is still broken!!!
Describe the bug
Just doesn't load pt files anymore. Really frustrating as it's been broken for a long time now. I keep posting about it, so now I'll just open a issue / bug instead of messaging in update threads. Last working version is 0.27.2
There is not a working safetensors or diffusers version of the VAE I'm using and I shouldn't have to. PT works just fine.
Reproduction
pipe = StableDiffusionPipeline.from_single_file(
"./assets/models/AOM3B4_orangemixs.safetensors",
safety_checker=None,
requires_safety_checker=False,
cache_dir=path.join("./assets/models"),
local_files_only=True,
torch_dtype=torch.bfloat16,
)
pipe = pipe.to(device, torch.bfloat16)
pipe.vae = AutoencoderKL.from_single_file(
path.join("./assets/vae/orangemix.vae.pt"),
local_files_only=True,
torch_dtype=torch.bfloat16
)
Logs
Traceback (most recent call last):
File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\diffusers\models\model_loading_utils.py", line 108, in load_state_dict
return torch.load(
File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\torch\serialization.py", line 1024, in load
raise pickle.UnpicklingError(UNSAFE_MESSAGE + str(e)) from None
_pickle.UnpicklingError: Weights only load failed. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution.Do it only if you get the file from a trusted source. WeightsUnpickler error: Unsupported class pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\diffusers\models\model_loading_utils.py", line 116, in load_state_dict
if f.read().startswith("version"):
File "C:\Program Files\Python310\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1681: character maps to <undefined>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-test\main.py", line 89, in <module>
asyncio.run(main(args.device, args.port))
File "C:\Program Files\Python310\lib\asyncio\runners.py", line 44, in run
return loop.run_until_complete(main)
File "C:\Program Files\Python310\lib\asyncio\base_events.py", line 649, in run_until_complete
return future.result()
File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-test\main.py", line 79, in main
pipe, clip_layers = shibiko_init(settings, device)
File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-test\src\generation.py", line 126, in shibiko_init
pipe.vae = AutoencoderKL.from_single_file(
File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\diffusers\loaders\single_file_model.py", line 209, in from_single_file
checkpoint = load_single_file_checkpoint(
File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\diffusers\loaders\single_file_utils.py", line 346, in load_single_file_checkpoint
checkpoint = load_state_dict(pretrained_model_link_or_path)
File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\diffusers\models\model_loading_utils.py", line 128, in load_state_dict
raise OSError(
OSError: Unable to load weights from checkpoint file for './assets/vae/orangemix.vae.pt' at './assets/vae/orangemix.vae.pt'.
System Info
Python 3.10.9
AMD 7950X3D | AMD 5950X RTX 4090 x2 | RTX 4090 128GB DDR5 | 128GB DDR4 Windows 10 | Windows 10
Who can help?
@sayakpaul
Where does "/assets/vae/orangemix.vae.pt" come from?
Cc: @DN6 for single file.
Any VAE that's saved in a pt format
Link To VAE I'm Using https://huggingface.co/WarriorMama777/OrangeMixs/blob/main/VAEs/orangemix.vae.pt
I know there is a diffuser version of this, however that doesn't work. It was broken in v0.27.2 so I switched to safetensors. Plus, the diffusers version doesn't have VAE working anyways.
Hi @JemiloII the issue here isn't the pt format. Rather that the checkpoint contains serialised objects that are not model weights. See attached screenshot below.
We switched to not allowing loading arbitrary serialised objects from pickle files after 0.27.2 since this is a potential security risk. Using torch.load with weights_only=False allows executing code on the users machine. See attached discussions:
https://github.com/pytorch/pytorch/issues/52181
https://github.com/pytorch/pytorch/issues/52596
https://github.com/voicepaw/so-vits-svc-fork/issues/193
You can load the VAE state dict with weights_only=False in the following way
import torch
from huggingface_hub import hf_hub_download
from diffusers import AutoencoderKL
state_dict = torch.load(hf_hub_download("WarriorMama777/OrangeMixs", filename="VAEs/orangemix.vae.pt"), weights_only=False)
vae = AutoencoderKL.from_single_file(state_dict)
That doesn't work.
Traceback (most recent call last):
File "C:\Users\Shibiko AI\AppData\Roaming\JetBrains\IntelliJIdea2024.2\plugins\python\helpers-pro\pydevd_asyncio\pydevd_nest_asyncio.py", line 138, in run
return loop.run_until_complete(task)
File "C:\Users\Shibiko AI\AppData\Roaming\JetBrains\IntelliJIdea2024.2\plugins\python\helpers-pro\pydevd_asyncio\pydevd_nest_asyncio.py", line 243, in run_until_complete
return f.result()
File "C:\Program Files\Python310\lib\asyncio\futures.py", line 201, in result
raise self._exception.with_traceback(self._exception_tb)
File "C:\Program Files\Python310\lib\asyncio\tasks.py", line 232, in __step
result = coro.send(None)
File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\main.py", line 81, in main
pipe, clip_layers = shibiko_init(settings, device)
File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\src\generation.py", line 144, in shibiko_init
pipe.vae = AutoencoderKL.from_single_file(state_dict)
File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\diffusers\loaders\autoencoder.py", line 119, in from_single_file
original_config, checkpoint = fetch_ldm_config_and_checkpoint(
File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\diffusers\loaders\single_file_utils.py", line 314, in fetch_ldm_config_and_checkpoint
checkpoint = load_single_file_model_checkpoint(
File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\diffusers\loaders\single_file_utils.py", line 339, in load_single_file_model_checkpoint
if os.path.isfile(pretrained_model_link_or_path):
File "C:\Program Files\Python310\lib\genericpath.py", line 30, in isfile
st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not dict
python-BaseException
Process finished with exit code 1
I even tried with pipe.vae.load_state_dict. No dice there. I even went to try that on 0.27.2...
Not sure if this is helpful, but the state_dict has these keys.
encoder.conv_in.weight
encoder.conv_in.bias
encoder.down.0.block.0.norm1.weight
encoder.down.0.block.0.norm1.bias
encoder.down.0.block.0.conv1.weight
encoder.down.0.block.0.conv1.bias
encoder.down.0.block.0.norm2.weight
encoder.down.0.block.0.norm2.bias
encoder.down.0.block.0.conv2.weight
encoder.down.0.block.0.conv2.bias
encoder.down.0.block.1.norm1.weight
encoder.down.0.block.1.norm1.bias
encoder.down.0.block.1.conv1.weight
encoder.down.0.block.1.conv1.bias
encoder.down.0.block.1.norm2.weight
encoder.down.0.block.1.norm2.bias
encoder.down.0.block.1.conv2.weight
encoder.down.0.block.1.conv2.bias
encoder.down.0.downsample.conv.weight
encoder.down.0.downsample.conv.bias
encoder.down.1.block.0.norm1.weight
encoder.down.1.block.0.norm1.bias
encoder.down.1.block.0.conv1.weight
encoder.down.1.block.0.conv1.bias
encoder.down.1.block.0.norm2.weight
encoder.down.1.block.0.norm2.bias
encoder.down.1.block.0.conv2.weight
encoder.down.1.block.0.conv2.bias
encoder.down.1.block.0.nin_shortcut.weight
encoder.down.1.block.0.nin_shortcut.bias
encoder.down.1.block.1.norm1.weight
encoder.down.1.block.1.norm1.bias
encoder.down.1.block.1.conv1.weight
encoder.down.1.block.1.conv1.bias
encoder.down.1.block.1.norm2.weight
encoder.down.1.block.1.norm2.bias
encoder.down.1.block.1.conv2.weight
encoder.down.1.block.1.conv2.bias
encoder.down.1.downsample.conv.weight
encoder.down.1.downsample.conv.bias
encoder.down.2.block.0.norm1.weight
encoder.down.2.block.0.norm1.bias
encoder.down.2.block.0.conv1.weight
encoder.down.2.block.0.conv1.bias
encoder.down.2.block.0.norm2.weight
encoder.down.2.block.0.norm2.bias
encoder.down.2.block.0.conv2.weight
encoder.down.2.block.0.conv2.bias
encoder.down.2.block.0.nin_shortcut.weight
encoder.down.2.block.0.nin_shortcut.bias
encoder.down.2.block.1.norm1.weight
encoder.down.2.block.1.norm1.bias
encoder.down.2.block.1.conv1.weight
encoder.down.2.block.1.conv1.bias
encoder.down.2.block.1.norm2.weight
encoder.down.2.block.1.norm2.bias
encoder.down.2.block.1.conv2.weight
encoder.down.2.block.1.conv2.bias
encoder.down.2.downsample.conv.weight
encoder.down.2.downsample.conv.bias
encoder.down.3.block.0.norm1.weight
encoder.down.3.block.0.norm1.bias
encoder.down.3.block.0.conv1.weight
encoder.down.3.block.0.conv1.bias
encoder.down.3.block.0.norm2.weight
encoder.down.3.block.0.norm2.bias
encoder.down.3.block.0.conv2.weight
encoder.down.3.block.0.conv2.bias
encoder.down.3.block.1.norm1.weight
encoder.down.3.block.1.norm1.bias
encoder.down.3.block.1.conv1.weight
encoder.down.3.block.1.conv1.bias
encoder.down.3.block.1.norm2.weight
encoder.down.3.block.1.norm2.bias
encoder.down.3.block.1.conv2.weight
encoder.down.3.block.1.conv2.bias
encoder.mid.block_1.norm1.weight
encoder.mid.block_1.norm1.bias
encoder.mid.block_1.conv1.weight
encoder.mid.block_1.conv1.bias
encoder.mid.block_1.norm2.weight
encoder.mid.block_1.norm2.bias
encoder.mid.block_1.conv2.weight
encoder.mid.block_1.conv2.bias
encoder.mid.attn_1.norm.weight
encoder.mid.attn_1.norm.bias
encoder.mid.attn_1.q.weight
encoder.mid.attn_1.q.bias
encoder.mid.attn_1.k.weight
encoder.mid.attn_1.k.bias
encoder.mid.attn_1.v.weight
encoder.mid.attn_1.v.bias
encoder.mid.attn_1.proj_out.weight
encoder.mid.attn_1.proj_out.bias
encoder.mid.block_2.norm1.weight
encoder.mid.block_2.norm1.bias
encoder.mid.block_2.conv1.weight
encoder.mid.block_2.conv1.bias
encoder.mid.block_2.norm2.weight
encoder.mid.block_2.norm2.bias
encoder.mid.block_2.conv2.weight
encoder.mid.block_2.conv2.bias
encoder.norm_out.weight
encoder.norm_out.bias
encoder.conv_out.weight
encoder.conv_out.bias
decoder.conv_in.weight
decoder.conv_in.bias
decoder.mid.block_1.norm1.weight
decoder.mid.block_1.norm1.bias
decoder.mid.block_1.conv1.weight
decoder.mid.block_1.conv1.bias
decoder.mid.block_1.norm2.weight
decoder.mid.block_1.norm2.bias
decoder.mid.block_1.conv2.weight
decoder.mid.block_1.conv2.bias
decoder.mid.attn_1.norm.weight
decoder.mid.attn_1.norm.bias
decoder.mid.attn_1.q.weight
decoder.mid.attn_1.q.bias
decoder.mid.attn_1.k.weight
decoder.mid.attn_1.k.bias
decoder.mid.attn_1.v.weight
decoder.mid.attn_1.v.bias
decoder.mid.attn_1.proj_out.weight
decoder.mid.attn_1.proj_out.bias
decoder.mid.block_2.norm1.weight
decoder.mid.block_2.norm1.bias
decoder.mid.block_2.conv1.weight
decoder.mid.block_2.conv1.bias
decoder.mid.block_2.norm2.weight
decoder.mid.block_2.norm2.bias
decoder.mid.block_2.conv2.weight
decoder.mid.block_2.conv2.bias
decoder.up.0.block.0.norm1.weight
decoder.up.0.block.0.norm1.bias
decoder.up.0.block.0.conv1.weight
decoder.up.0.block.0.conv1.bias
decoder.up.0.block.0.norm2.weight
decoder.up.0.block.0.norm2.bias
decoder.up.0.block.0.conv2.weight
decoder.up.0.block.0.conv2.bias
decoder.up.0.block.0.nin_shortcut.weight
decoder.up.0.block.0.nin_shortcut.bias
decoder.up.0.block.1.norm1.weight
decoder.up.0.block.1.norm1.bias
decoder.up.0.block.1.conv1.weight
decoder.up.0.block.1.conv1.bias
decoder.up.0.block.1.norm2.weight
decoder.up.0.block.1.norm2.bias
decoder.up.0.block.1.conv2.weight
decoder.up.0.block.1.conv2.bias
decoder.up.0.block.2.norm1.weight
decoder.up.0.block.2.norm1.bias
decoder.up.0.block.2.conv1.weight
decoder.up.0.block.2.conv1.bias
decoder.up.0.block.2.norm2.weight
decoder.up.0.block.2.norm2.bias
decoder.up.0.block.2.conv2.weight
decoder.up.0.block.2.conv2.bias
decoder.up.1.block.0.norm1.weight
decoder.up.1.block.0.norm1.bias
decoder.up.1.block.0.conv1.weight
decoder.up.1.block.0.conv1.bias
decoder.up.1.block.0.norm2.weight
decoder.up.1.block.0.norm2.bias
decoder.up.1.block.0.conv2.weight
decoder.up.1.block.0.conv2.bias
decoder.up.1.block.0.nin_shortcut.weight
decoder.up.1.block.0.nin_shortcut.bias
decoder.up.1.block.1.norm1.weight
decoder.up.1.block.1.norm1.bias
decoder.up.1.block.1.conv1.weight
decoder.up.1.block.1.conv1.bias
decoder.up.1.block.1.norm2.weight
decoder.up.1.block.1.norm2.bias
decoder.up.1.block.1.conv2.weight
decoder.up.1.block.1.conv2.bias
decoder.up.1.block.2.norm1.weight
decoder.up.1.block.2.norm1.bias
decoder.up.1.block.2.conv1.weight
decoder.up.1.block.2.conv1.bias
decoder.up.1.block.2.norm2.weight
decoder.up.1.block.2.norm2.bias
decoder.up.1.block.2.conv2.weight
decoder.up.1.block.2.conv2.bias
decoder.up.1.upsample.conv.weight
decoder.up.1.upsample.conv.bias
decoder.up.2.block.0.norm1.weight
decoder.up.2.block.0.norm1.bias
decoder.up.2.block.0.conv1.weight
decoder.up.2.block.0.conv1.bias
decoder.up.2.block.0.norm2.weight
decoder.up.2.block.0.norm2.bias
decoder.up.2.block.0.conv2.weight
decoder.up.2.block.0.conv2.bias
decoder.up.2.block.1.norm1.weight
decoder.up.2.block.1.norm1.bias
decoder.up.2.block.1.conv1.weight
decoder.up.2.block.1.conv1.bias
decoder.up.2.block.1.norm2.weight
decoder.up.2.block.1.norm2.bias
decoder.up.2.block.1.conv2.weight
decoder.up.2.block.1.conv2.bias
decoder.up.2.block.2.norm1.weight
decoder.up.2.block.2.norm1.bias
decoder.up.2.block.2.conv1.weight
decoder.up.2.block.2.conv1.bias
decoder.up.2.block.2.norm2.weight
decoder.up.2.block.2.norm2.bias
decoder.up.2.block.2.conv2.weight
decoder.up.2.block.2.conv2.bias
decoder.up.2.upsample.conv.weight
decoder.up.2.upsample.conv.bias
decoder.up.3.block.0.norm1.weight
decoder.up.3.block.0.norm1.bias
decoder.up.3.block.0.conv1.weight
decoder.up.3.block.0.conv1.bias
decoder.up.3.block.0.norm2.weight
decoder.up.3.block.0.norm2.bias
decoder.up.3.block.0.conv2.weight
decoder.up.3.block.0.conv2.bias
decoder.up.3.block.1.norm1.weight
decoder.up.3.block.1.norm1.bias
decoder.up.3.block.1.conv1.weight
decoder.up.3.block.1.conv1.bias
decoder.up.3.block.1.norm2.weight
decoder.up.3.block.1.norm2.bias
decoder.up.3.block.1.conv2.weight
decoder.up.3.block.1.conv2.bias
decoder.up.3.block.2.norm1.weight
decoder.up.3.block.2.norm1.bias
decoder.up.3.block.2.conv1.weight
decoder.up.3.block.2.conv1.bias
decoder.up.3.block.2.norm2.weight
decoder.up.3.block.2.norm2.bias
decoder.up.3.block.2.conv2.weight
decoder.up.3.block.2.conv2.bias
decoder.up.3.upsample.conv.weight
decoder.up.3.upsample.conv.bias
decoder.norm_out.weight
decoder.norm_out.bias
decoder.conv_out.weight
decoder.conv_out.bias
loss.logvar
loss.perceptual_loss.scaling_layer.shift
loss.perceptual_loss.scaling_layer.scale
loss.perceptual_loss.net.slice1.0.weight
loss.perceptual_loss.net.slice1.0.bias
loss.perceptual_loss.net.slice1.2.weight
loss.perceptual_loss.net.slice1.2.bias
loss.perceptual_loss.net.slice2.5.weight
loss.perceptual_loss.net.slice2.5.bias
loss.perceptual_loss.net.slice2.7.weight
loss.perceptual_loss.net.slice2.7.bias
loss.perceptual_loss.net.slice3.10.weight
loss.perceptual_loss.net.slice3.10.bias
loss.perceptual_loss.net.slice3.12.weight
loss.perceptual_loss.net.slice3.12.bias
loss.perceptual_loss.net.slice3.14.weight
loss.perceptual_loss.net.slice3.14.bias
loss.perceptual_loss.net.slice4.17.weight
loss.perceptual_loss.net.slice4.17.bias
loss.perceptual_loss.net.slice4.19.weight
loss.perceptual_loss.net.slice4.19.bias
loss.perceptual_loss.net.slice4.21.weight
loss.perceptual_loss.net.slice4.21.bias
loss.perceptual_loss.net.slice5.24.weight
loss.perceptual_loss.net.slice5.24.bias
loss.perceptual_loss.net.slice5.26.weight
loss.perceptual_loss.net.slice5.26.bias
loss.perceptual_loss.net.slice5.28.weight
loss.perceptual_loss.net.slice5.28.bias
loss.perceptual_loss.lin0.model.1.weight
loss.perceptual_loss.lin1.model.1.weight
loss.perceptual_loss.lin2.model.1.weight
loss.perceptual_loss.lin3.model.1.weight
loss.perceptual_loss.lin4.model.1.weight
loss.discriminator.main.0.weight
loss.discriminator.main.0.bias
loss.discriminator.main.2.weight
loss.discriminator.main.3.weight
loss.discriminator.main.3.bias
loss.discriminator.main.3.running_mean
loss.discriminator.main.3.running_var
loss.discriminator.main.3.num_batches_tracked
loss.discriminator.main.5.weight
loss.discriminator.main.6.weight
loss.discriminator.main.6.bias
loss.discriminator.main.6.running_mean
loss.discriminator.main.6.running_var
loss.discriminator.main.6.num_batches_tracked
loss.discriminator.main.8.weight
loss.discriminator.main.9.weight
loss.discriminator.main.9.bias
loss.discriminator.main.9.running_mean
loss.discriminator.main.9.running_var
loss.discriminator.main.9.num_batches_tracked
loss.discriminator.main.11.weight
loss.discriminator.main.11.bias
quant_conv.weight
quant_conv.bias
post_quant_conv.weight
post_quant_conv.bias
Which version of diffusers are you using? The snippet I shared is meant to be run with the >0.27.2 version. Based on the traceback it seems like you're using version 0.27.2 to try and load the state dict?
I tried with 0.27.2 and the latest release. The goal is to update to the latest, but the diffusers keeps making breaking changes. This PT one is crazy as many vsts for sd1.5 are in pt format and many i've found for sdxl are in that format as well.
When an update happens, I don't expect my production app to just break from making zero code changes. I don't use hub, but i even tried with your hub example. I don't like using the hub, just local files so i know nothing ever changes. hub will get updates. hub saves things in mysterious locations where you don't want model files anyways.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Not if this can help you, it worked for me, this notebook converts the .pt file to .safetensors, so that I can then use the VAE load normally with the single_flle function , at least I converted that same vae :
https://github.com/DiffusionDalmation/pt_to_safetensors_converter_notebook
vae = AutoencoderKL.from_single_file(
vae_name,
torch_dtype=torch.float16,
).to(device)
pipeline.vae = vae
I guess I'll have to convert this to not be a notebook and test it out.
oh, I didn't think that would solve your problem since a lot of researchers and model owners still don't publish safetensors files. I can convert the ones you need if you want and helps you.
I tried with 0.27.2 and the latest release. The goal is to update to the latest, but the diffusers keeps making breaking changes. This PT one is crazy as many vsts for sd1.5 are in pt format and many i've found for sdxl are in that format as well.
The issue is not with the .pt extension, but due to the fact that the checkpoints you are trying to load contain "non-weight" serialized objects. We do not allow loading pickle files that contain these objects anymore since it is a security risk.
The snippet I shared is only an example of what to do if you have pickle files with such types of objects. You can replace the hub download part with a path to your local file.
- Use torch load to deserialize the checkpoint and then pass the
state_dictdirectly tofrom_single_file. - Convert the pickle file to a safetensors file and load it via single file as @Eduardishion suggested
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Marking as closed due to inactivity, and because I think this was sufficiently addressed with solutions above. Please feel free to re-open if you still need help with this, or a new issue if we can assist with anything else :hugs:
that wasn't really a solution to convert things just to fit an overly opinionated design choice. but it doesn't really matter anymore, I've moved on from this project that enjoys to consistently make breaking changes.