PuLID icon indicating copy to clipboard operation
PuLID copied to clipboard

Flux Out Of Memory

Open curlysasha opened this issue 1 year ago • 19 comments

24vram and 64 ram is not enough?

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 90.00 MiB (GPU 0; 23.99 GiB total capacity; 50.07 GiB already allocated; 0 bytes free; 53.93 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

curlysasha avatar Sep 13 '24 14:09 curlysasha

Same problem but on A6000 with 48Gb Vram. Developers, how to fix that?

Traceback (most recent call last): File "/app/lib/app_flux.py", line 290, in demo = create_demo(args, args.name, args.device, args.offload) File "/app/lib/app_flux.py", line 170, in create_demo generator = FluxGenerator(model_name, device, offload, args) File "/app/lib/app_flux.py", line 29, in init self.model, self.ae, self.t5, self.clip = get_models( File "/app/lib/app_flux.py", line 18, in get_models model = load_flow_model(name, device="cpu" if offload else device) File "/app/lib/flux/util.py", line 127, in load_flow_model model = Flux(configs[name].params).to(torch.bfloat16) File "/app/lib/flux/model.py", line 74, in init [ File "/app/lib/flux/model.py", line 75, in SingleStreamBlock(self.hidden_size, self.num_heads, mlp_ratio=params.mlp_ratio) File "/app/lib/flux/modules/layers.py", line 225, in init self.modulation = Modulation(hidden_size, double=False) File "/app/lib/flux/modules/layers.py", line 118, in init self.lin = nn.Linear(dim, self.multiplier * dim, bias=True) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py", line 96, in init self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs)) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_device.py", line 62, in torch_function return func(*args, **kwargs) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 108.00 MiB (GPU 0; 47.52 GiB total capacity; 45.88 GiB already allocated; 66.44 MiB free; 45.90 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

baleksey avatar Sep 13 '24 15:09 baleksey

@curlysasha @baleksey We just optimize the codes to support customer-grade GPU. Please refer to the instruction.

ToTheBeginning avatar Sep 13 '24 17:09 ToTheBeginning

@ToTheBeginning Thank you for reply! But I still can't run it. With --offload it doesn't show OOM but after model loading it shows Onnx runtime errors but still starting gradio server. After trying anything in gradio UI -it stops with another error. I've added my full log from loading to server stop so you can understand what's is going on.

error_log.txt

P.S. I've tried to reinstall onnx-gpu and rerun installing requirements - nothing helped.

UPDATE: I've menaged to run it after many-many dependencies reinstalled (optimum-quanto required much higher torch version). And here is what i got in the end: result_pulid

Is it possible for you to create Dockerfile with tested packages and environment so it 100% working? It's pretty hard to run PuLID locally..

baleksey avatar Sep 13 '24 18:09 baleksey

@curlysasha @baleksey We just optimize the codes to support customer-grade GPU. Please refer to the instruction.

(pulid) C:\GIT\PuLID>python app_flux.py --offload --fp8 C:\Users\user.conda\envs\pulid\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: 'Could not find module 'C:\Users\user.conda\envs\pulid\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( Please 'pip install xformers' Please 'pip install apex' Please 'pip install xformers' Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 6.78it/s] Traceback (most recent call last): File "C:\GIT\PuLID\app_flux.py", line 310, in demo = create_demo(args, args.name, args.device, args.offload, args.aggressive_offload) File "C:\GIT\PuLID\app_flux.py", line 185, in create_demo generator = FluxGenerator(model_name, device, offload, aggressive_offload, args) File "C:\GIT\PuLID\app_flux.py", line 39, in init self.model, self.ae, self.t5, self.clip = get_models( File "C:\GIT\PuLID\app_flux.py", line 22, in get_models t5 = load_t5(device, max_length=128) File "C:\GIT\PuLID\flux\util.py", line 165, in load_t5 return HFEmbedder("xlabs-ai/xflux_text_encoders", max_length=max_length, torch_dtype=torch.bfloat16).to(device) File "C:\Users\user.conda\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 1174, in to return self._apply(convert) File "C:\Users\user.conda\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply module._apply(fn) File "C:\Users\user.conda\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply module._apply(fn) File "C:\Users\user.conda\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 805, in apply param_applied = fn(param) File "C:\Users\user.conda\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 1160, in convert return t.to( File "C:\Users\user.conda\envs\pulid\lib\site-packages\torch\cuda_init.py", line 305, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

curlysasha avatar Sep 13 '24 20:09 curlysasha

optimum-quanto 0.2.4 requires torch>=2.4.0, but you have torch 2.0.1 which is incompatible.

curlysasha avatar Sep 13 '24 20:09 curlysasha

@baleksey Based on the error log you provided, here are some possible reasons for the issues you're facing, even though I haven't encountered these problems myself:

The model is throwing an error when loading the VAE. It seems that the VAE model file you provided might be incorrect. It looks like you have provided the VAE for "sdxl" instead of the VAE for "flux-dev".

Regarding the errors related to onnxruntime, you might want to check if there are similar issues in the community.

Lastly, for the errors concerning pydantic, you can refer to another issue https://github.com/ToTheBeginning/PuLID/issues/61

ToTheBeginning avatar Sep 14 '24 04:09 ToTheBeginning

@curlysasha Thanks, updated now.

ToTheBeginning avatar Sep 14 '24 04:09 ToTheBeginning

@curlysasha Thanks, updated now.

i`m recreating env with requirements_fp8 but nothing change( maybe i need to put fp8 model?)

(pulid) C:\GIT\PuLID>python app_flux.py --offload --fp8 Please 'pip install xformers' Please 'pip install apex' Please 'pip install xformers' Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 7.26it/s] Traceback (most recent call last): File "C:\GIT\PuLID\app_flux.py", line 310, in demo = create_demo(args, args.name, args.device, args.offload, args.aggressive_offload) File "C:\GIT\PuLID\app_flux.py", line 185, in create_demo generator = FluxGenerator(model_name, device, offload, aggressive_offload, args) File "C:\GIT\PuLID\app_flux.py", line 39, in init self.model, self.ae, self.t5, self.clip = get_models( File "C:\GIT\PuLID\app_flux.py", line 22, in get_models t5 = load_t5(device, max_length=128) File "C:\GIT\PuLID\flux\util.py", line 165, in load_t5 return HFEmbedder("xlabs-ai/xflux_text_encoders", max_length=max_length, torch_dtype=torch.bfloat16).to(device) File "C:\Users\user.conda\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 1174, in to return self._apply(convert) File "C:\Users\user.conda\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply module._apply(fn) File "C:\Users\user.conda\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply module._apply(fn) File "C:\Users\user.conda\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 805, in apply param_applied = fn(param) File "C:\Users\user.conda\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 1160, in convert return t.to( File "C:\Users\user.conda\envs\pulid\lib\site-packages\torch\cuda_init.py", line 305, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

curlysasha avatar Sep 14 '24 07:09 curlysasha

@ToTheBeginning Thank you! It was wrong VAE, now it works.

@curlysasha Here is my current "freezed" python libs after I've reinstalled many packages and it finally working now. Hope it will help you as well: requirements.txt

baleksey avatar Sep 14 '24 07:09 baleksey

@ToTheBeginning Thank you! It was wrong VAE, now it works.

@curlysasha Here is my current "freezed" python libs after I've reinstalled many packages and it finally working now. Hope it will help you as well: requirements.txt

ERROR: Ignored the following yanked versions: 8.9.4.19 ERROR: Ignored the following versions that require a different python version: 0.36.0 Requires-Python >=3.6,<3.10; 0.37.0 Requires-Python >=3.7,<3.10; 0.52.0 Requires-Python >=3.6,<3.9; 0.52.0rc3 Requires-Python >=3.6,<3.9; 0.53.0 Requires-Python >=3.6,<3.10; 0.53.0rc1.post1 Requires-Python >=3.6,<3.10; 0.53.0rc2 Requires-Python >=3.6,<3.10; 0.53.0rc3 Requires-Python >=3.6,<3.10; 0.53.1 Requires-Python >=3.6,<3.10; 0.54.0 Requires-Python >=3.7,<3.10; 0.54.0rc2 Requires-Python >=3.7,<3.10; 0.54.0rc3 Requires-Python >=3.7,<3.10; 0.54.1 Requires-Python >=3.7,<3.10 ERROR: Could not find a version that satisfies the requirement nvidia-cudnn-cu11==8.5.0.96 (from versions: 0.0.1.dev5, 8.9.4.25, 8.9.5.29, 9.0.0.312, 9.1.0.70, 9.1.1.17, 9.2.0.82, 9.2.1.18, 9.3.0.75, 9.4.0.58, 2021.10.26, 2021.11.18, 2021.12.8, 2022.1.13, 2022.4.2, 2022.5.19) ERROR: No matching distribution found for nvidia-cudnn-cu11==8.5.0.96

python is 3.10

curlysasha avatar Sep 14 '24 07:09 curlysasha

we have updated the codes, now PuLID-FLUX can run on 16GB consumer-grade cards.

ToTheBeginning avatar Sep 14 '24 09:09 ToTheBeginning

we have updated the codes, now PuLID-FLUX can run on 16GB consumer-grade cards.

maybe i need to put fp8 model? i think its not downloaded automatically

curlysasha avatar Sep 14 '24 09:09 curlysasha

maybe i need to put fp8 model? i think its not downloaded automatically

It is auto downloaded from https://huggingface.co/XLabs-AI/flux-dev-fp8.

ToTheBeginning avatar Sep 14 '24 09:09 ToTheBeginning

maybe i need to put fp8 model? i think its not downloaded automatically

It is auto downloaded from https://huggingface.co/XLabs-AI/flux-dev-fp8.

no (

(pulid) C:\GIT\PuLID>python app_flux.py --offload --fp8 Please 'pip install xformers' Please 'pip install apex' Please 'pip install xformers' Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 7.52it/s] Traceback (most recent call last): File "C:\GIT\PuLID\app_flux.py", line 325, in demo = create_demo(args, args.name, args.device, args.offload, args.aggressive_offload) File "C:\GIT\PuLID\app_flux.py", line 197, in create_demo generator = FluxGenerator(model_name, device, offload, aggressive_offload, args) File "C:\GIT\PuLID\app_flux.py", line 39, in init self.model, self.ae, self.t5, self.clip = get_models( File "C:\GIT\PuLID\app_flux.py", line 22, in get_models t5 = load_t5(device, max_length=128) File "C:\GIT\PuLID\flux\util.py", line 165, in load_t5 return HFEmbedder("xlabs-ai/xflux_text_encoders", max_length=max_length, torch_dtype=torch.bfloat16).to(device) File "C:\Users\user.conda\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 1174, in to return self._apply(convert) File "C:\Users\user.conda\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply module._apply(fn) File "C:\Users\user.conda\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply module._apply(fn) File "C:\Users\user.conda\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 805, in apply param_applied = fn(param) File "C:\Users\user.conda\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 1160, in convert return t.to( File "C:\Users\user.conda\envs\pulid\lib\site-packages\torch\cuda_init.py", line 305, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

Now i`m trying to delete text encoders folder, maybe this is a problem (no)

curlysasha avatar Sep 14 '24 09:09 curlysasha

fp8 model not downloaded

curlysasha avatar Sep 14 '24 10:09 curlysasha

@ToTheBeginning can you fix this?

curlysasha avatar Sep 17 '24 07:09 curlysasha

keep getting this error... (pulid) PS E:\Pulid\PuLID> python app_flux.py --offload --fp8 Please 'pip install apex' Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 6.95it/s] Traceback (most recent call last): File "E:\Pulid\PuLID\app_flux.py", line 325, in demo = create_demo(args, args.name, args.device, args.offload, args.aggressive_offload) File "E:\Pulid\PuLID\app_flux.py", line 197, in create_demo generator = FluxGenerator(model_name, device, offload, aggressive_offload, args) File "E:\Pulid\PuLID\app_flux.py", line 39, in init self.model, self.ae, self.t5, self.clip = get_models( File "E:\Pulid\PuLID\app_flux.py", line 22, in get_models t5 = load_t5(device, max_length=128) File "E:\Pulid\PuLID\flux\util.py", line 165, in load_t5 return HFEmbedder("xlabs-ai/xflux_text_encoders", max_length=max_length, torch_dtype=torch.bfloat16).to(device) File "C:\Users\User\miniconda3\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 1174, in to return self._apply(convert) File "C:\Users\User\miniconda3\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply module._apply(fn) File "C:\Users\User\miniconda3\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply module._apply(fn) File "C:\Users\User\miniconda3\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 805, in apply param_applied = fn(param) File "C:\Users\User\miniconda3\envs\pulid\lib\site-packages\torch\nn\modules\module.py", line 1160, in convert return t.to( File "C:\Users\User\miniconda3\envs\pulid\lib\site-packages\torch\cuda_init.py", line 305, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

Fearganainim avatar Sep 24 '24 10:09 Fearganainim

Packages installed in env

packages in environment at C:\Users\User\miniconda3\envs\pulid:

Name Version Build Channel

accelerate 0.34.2 pypi_0 pypi aiofiles 23.2.1 pypi_0 pypi albucore 0.0.17 pypi_0 pypi albumentations 1.4.16 pypi_0 pypi altair 5.4.1 pypi_0 pypi annotated-types 0.7.0 pypi_0 pypi anyio 4.6.0 pypi_0 pypi attrs 24.2.0 pypi_0 pypi blas 1.0 mkl brotli-python 1.0.9 py310hd77b12b_8 bzip2 1.0.8 h2bbff1b_6 ca-certificates 2024.7.2 haa95532_0 certifi 2024.8.30 py310haa95532_0 charset-normalizer 3.3.2 pyhd3eb1b0_0 click 8.1.7 pypi_0 pypi colorama 0.4.6 pypi_0 pypi coloredlogs 15.0.1 pypi_0 pypi contourpy 1.3.0 pypi_0 pypi cudatoolkit 11.3.1 h59b6b97_2 cycler 0.12.1 pypi_0 pypi cython 3.0.11 pypi_0 pypi diffusers 0.30.0 pypi_0 pypi easydict 1.13 pypi_0 pypi einops 0.8.0 pypi_0 pypi eval-type-backport 0.2.0 pypi_0 pypi exceptiongroup 1.2.2 pypi_0 pypi facexlib 0.3.0 pypi_0 pypi fastapi 0.115.0 pypi_0 pypi ffmpy 0.4.0 pypi_0 pypi filelock 3.16.1 pypi_0 pypi filterpy 1.4.5 pypi_0 pypi flatbuffers 24.3.25 pypi_0 pypi fonttools 4.53.1 pypi_0 pypi freetype 2.12.1 ha860e81_0 fsspec 2024.9.0 pypi_0 pypi ftfy 6.2.3 pypi_0 pypi gmpy2 2.1.2 py310h7f96b67_0 gradio 4.19.1 pypi_0 pypi gradio-client 0.10.0 pypi_0 pypi h11 0.14.0 pypi_0 pypi httpcore 0.16.3 pypi_0 pypi httpx 0.23.3 pypi_0 pypi huggingface-hub 0.25.0 pypi_0 pypi humanfriendly 10.0 pypi_0 pypi idna 3.10 pypi_0 pypi imageio 2.35.1 pypi_0 pypi importlib-metadata 8.5.0 pypi_0 pypi importlib-resources 6.4.5 pypi_0 pypi insightface 0.7.3 pypi_0 pypi intel-openmp 2023.1.0 h59b6b97_46320 jinja2 3.1.4 py310haa95532_0 joblib 1.4.2 pypi_0 pypi jpeg 9e h827c3e9_3 jsonschema 4.23.0 pypi_0 pypi jsonschema-specifications 2023.12.1 pypi_0 pypi kiwisolver 1.4.7 pypi_0 pypi lazy-loader 0.4 pypi_0 pypi lcms2 2.12 h83e58a3_0 lerc 3.0 hd77b12b_0 libdeflate 1.17 h2bbff1b_1 libffi 3.4.4 hd77b12b_1 libjpeg-turbo 2.0.0 h196d8e1_0 libpng 1.6.39 h8cc25b3_0 libtiff 4.5.1 hd77b12b_0 libuv 1.48.0 h827c3e9_0 libwebp-base 1.3.2 h2bbff1b_0 llvmlite 0.43.0 pypi_0 pypi lz4-c 1.9.4 h2bbff1b_1 markdown-it-py 3.0.0 pypi_0 pypi markupsafe 2.1.5 pypi_0 pypi matplotlib 3.9.2 pypi_0 pypi mdurl 0.1.2 pypi_0 pypi mkl 2023.1.0 h6b88ed4_46358 mkl-service 2.4.0 py310h2bbff1b_1 mkl_fft 1.3.10 py310h827c3e9_0 mkl_random 1.2.7 py310hc64d2fc_0 mpc 1.1.0 h7edee0f_1 mpfr 4.0.2 h62dcd97_1 mpir 3.0.0 hec2e145_1 mpmath 1.3.0 py310haa95532_0 narwhals 1.8.2 pypi_0 pypi networkx 3.3 py310haa95532_0 ninja 1.11.1.1 pypi_0 pypi numba 0.60.0 pypi_0 pypi numpy 1.26.4 py310h055cbcc_0 numpy-base 1.26.4 py310h65a83cf_0 onnx 1.16.2 pypi_0 pypi onnxruntime 1.19.2 pypi_0 pypi onnxruntime-gpu 1.19.2 pypi_0 pypi opencv-python 4.10.0.84 pypi_0 pypi opencv-python-headless 4.10.0.84 pypi_0 pypi openjpeg 2.5.2 hae555c5_0 openssl 3.0.15 h827c3e9_0 optimum-quanto 0.2.4 pypi_0 pypi orjson 3.10.7 pypi_0 pypi packaging 24.1 pypi_0 pypi pandas 2.2.3 pypi_0 pypi pillow 10.4.0 py310h827c3e9_0 pip 24.2 py310haa95532_0 prettytable 3.11.0 pypi_0 pypi protobuf 5.28.2 pypi_0 pypi psutil 6.0.0 pypi_0 pypi pydantic 2.9.2 pypi_0 pypi pydantic-core 2.23.4 pypi_0 pypi pydub 0.25.1 pypi_0 pypi pygments 2.18.0 pypi_0 pypi pyparsing 3.1.4 pypi_0 pypi pyreadline3 3.5.4 pypi_0 pypi pysocks 1.7.1 py310haa95532_0 python 3.10.14 he1021f5_1 python-dateutil 2.9.0.post0 pypi_0 pypi python-multipart 0.0.10 pypi_0 pypi pytorch-mutex 1.0 cpu pytorch pytz 2024.2 pypi_0 pypi pyyaml 6.0.2 pypi_0 pypi referencing 0.35.1 pypi_0 pypi regex 2024.9.11 pypi_0 pypi requests 2.32.3 py310haa95532_0 rfc3986 1.5.0 pypi_0 pypi rich 13.8.1 pypi_0 pypi rpds-py 0.20.0 pypi_0 pypi ruff 0.6.7 pypi_0 pypi safetensors 0.4.5 pypi_0 pypi scikit-image 0.24.0 pypi_0 pypi scikit-learn 1.5.2 pypi_0 pypi scipy 1.14.1 pypi_0 pypi semantic-version 2.10.0 pypi_0 pypi sentencepiece 0.2.0 pypi_0 pypi setuptools 75.1.0 py310haa95532_0 shellingham 1.5.4 pypi_0 pypi six 1.16.0 pypi_0 pypi sniffio 1.3.1 pypi_0 pypi sqlite 3.45.3 h2bbff1b_0 starlette 0.38.6 pypi_0 pypi sympy 1.13.3 pypi_0 pypi tbb 2021.8.0 h59b6b97_0 threadpoolctl 3.5.0 pypi_0 pypi tifffile 2024.9.20 pypi_0 pypi timm 1.0.9 pypi_0 pypi tk 8.6.14 h0416ee5_0 tokenizers 0.19.1 pypi_0 pypi tomlkit 0.12.0 pypi_0 pypi torch 2.4.1 pypi_0 pypi torchaudio 0.13.0 py310_cpu pytorch torchvision 0.19.1 pypi_0 pypi tqdm 4.66.5 pypi_0 pypi transformers 4.43.3 pypi_0 pypi typer 0.12.5 pypi_0 pypi typing-extensions 4.12.2 pypi_0 pypi typing_extensions 4.11.0 py310haa95532_0 tzdata 2024.1 pypi_0 pypi urllib3 2.2.3 pypi_0 pypi uvicorn 0.30.6 pypi_0 pypi vc 14.40 h2eaa2aa_1 vs2015_runtime 14.40.33807 h98bb1dd_1 wcwidth 0.2.13 pypi_0 pypi websockets 11.0.3 pypi_0 pypi wheel 0.44.0 py310haa95532_0 win_inet_pton 1.1.0 py310haa95532_0 xformers 0.0.28.post1 pypi_0 pypi xz 5.4.6 h8cc25b3_1 yaml 0.2.5 he774522_0 zipp 3.20.2 pypi_0 pypi zlib 1.2.13 h8cc25b3_1 zstd 1.5.5 hd43e919_2 (pulid) PS E:\Pulid\PuLID>

Fearganainim avatar Sep 24 '24 10:09 Fearganainim

@curlysasha any results with your issue ? mine seems to be the same

Fearganainim avatar Sep 24 '24 10:09 Fearganainim