Tried to implement first block cache from paraattention got slight improvment in speed.
Hello, I tried to implement FBcache, which is implemented in ParaAttention.
My repo: diffsynth-wan-cache
I was able to improve my speed without much loss of visual quality.
I hope you guys try it out. Let me know if it works for you and if any changes need to be made.
I could not run I2V on my setup and had some issues loading the models. If someone can try it, please do.
Wan 2.1 T2V 14B
Ran it on RTX 4090 24GB VRAM
| No FBCache | 0.04 FBCache | 0.07 FBCache | 0.09 FBCache |
|---|---|---|---|
| 1216.17 sec | 1090.77 sec (1.11X) | 657.64 sec (1.85X) | 359.47 sec (Not usable) |
| Videos are available at assets. |
No FBCache result
https://github.com/user-attachments/assets/c9ca618b-9295-496a-a10b-be6f4655f963
0.07 FBCache result
https://github.com/user-attachments/assets/08937e38-9bce-47fd-b330-95ac4bf348d5
any chance you could implement official teacache?
this looks like a big quality drop @testdummyvt
also if you were repo were fork of DiffSynth-Studio i was gonna test it
@FurkanGozukara I will look into teacahe, will update I am able to.
also if you were repo were fork of DiffSynth-Studio i was gonna test it
Once you install DiffSynth-Studio, you can run the py files from my repo. It just calls diffsythn. Same as examples/wanvideo files, but with added fbcache code.
@FurkanGozukara I will look into teacahe, will update I am able to.
also if you were repo were fork of DiffSynth-Studio i was gonna test it
Once you install DiffSynth-Studio, you can run the py files from my repo. It just calls diffsythn. Same as
examples/wanvideofiles, but with added fbcache code.
I have my own app implementation. I use DiffSynth-Studio to load and inference in my app
@FurkanGozukara @testdummyvt We are very cautious about adopting inference acceleration technologies. "There is no such thing as a free lunch." Almost all similar technologies come at the cost of visual quality, and we are still conducting experiments to compare the effectiveness of these technologies.
@FurkanGozukara @testdummyvt We are very cautious about adopting inference acceleration technologies. "There is no such thing as a free lunch." Almost all similar technologies come at the cost of visual quality, and we are still conducting experiments to compare the effectiveness of these technologies.
100% that is the case. DiffSynth-Studio currently supports GGUF? I tested single unified DiT for Wan 2.1 and it worked. but I haven't tested GGUF yet planning to test. People telling me that GGUF fits into 32 GB RAM and 16 GB VRAM for 14b models and requesting me to support
Like Q6 GGUF Wan 2.1
GGUF models here @Artiprocher : https://huggingface.co/city96/Wan2.1-T2V-14B-gguf/tree/main
@FurkanGozukara Hello. Did you test it? Whether GGUF is supported
@FurkanGozukara Hello. Did you test it? Whether GGUF is supported
@FurkanGozukara Hello. Did you test it? Whether GGUF is supported
sadly failed @Artiprocher
WAN 2.1 14B Text-to-Video
[CMD] Loading model: 14B_text with torch dtype: torch.bfloat16 and num_persistent_param_in_dit: 4250000000
Loading models from: ['models\\Wan-AI\\Wan2.1-T2V-14B\\wan2.1-t2v-14b-Q6_K.gguf']
Traceback (most recent call last):
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\queueing.py", line 625, in process_events
response = await route_utils.call_process_api(
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\blocks.py", line 2103, in process_api
result = await self.call_function(
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\blocks.py", line 1650, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 2461, in run_sync_in_worker_thread
return await future
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 962, in run
result = context.run(func, *args)
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\utils.py", line 890, in wrapper
response = f(*args, **kwargs)
File "E:\Wan21_v3\Wan2.1\App.py", line 749, in generate_videos
loaded_pipeline = load_wan_pipeline(model_choice, torch_dtype, vram_value, lora_path=effective_loras, lora_alpha=None)
File "E:\Wan21_v3\Wan2.1\App.py", line 1209, in load_wan_pipeline
model_manager.load_models(
File "E:\Wan21_v3\Wan2.1\DiffSynth-Studio\diffsynth\models\model_manager.py", line 426, in load_models
self.load_model(file_path, model_names, device=device, torch_dtype=torch_dtype)
File "E:\Wan21_v3\Wan2.1\DiffSynth-Studio\diffsynth\models\model_manager.py", line 402, in load_model
state_dict.update(load_state_dict(path))
File "E:\Wan21_v3\Wan2.1\DiffSynth-Studio\diffsynth\models\utils.py", line 69, in load_state_dict
return load_state_dict_from_bin(file_path, torch_dtype=torch_dtype)
File "E:\Wan21_v3\Wan2.1\DiffSynth-Studio\diffsynth\models\utils.py", line 83, in load_state_dict_from_bin
state_dict = torch.load(file_path, map_location="cpu", weights_only=True)
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\torch\serialization.py", line 1548, in load
raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Unsupported operand 4
Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
I changed pickle loading in utils here new error @Artiprocher
WAN 2.1 14B Text-to-Video
[CMD] Loading model: 14B_text with torch dtype: torch.bfloat16 and num_persistent_param_in_dit: 4250000000
Loading models from: ['models\\Wan-AI\\Wan2.1-T2V-14B\\wan2.1-t2v-14b-Q6_K.gguf']
Traceback (most recent call last):
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\queueing.py", line 625, in process_events
response = await route_utils.call_process_api(
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\blocks.py", line 2103, in process_api
result = await self.call_function(
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\blocks.py", line 1650, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 2461, in run_sync_in_worker_thread
return await future
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 962, in run
result = context.run(func, *args)
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\utils.py", line 890, in wrapper
response = f(*args, **kwargs)
File "E:\Wan21_v3\Wan2.1\App.py", line 749, in generate_videos
loaded_pipeline = load_wan_pipeline(model_choice, torch_dtype, vram_value, lora_path=effective_loras, lora_alpha=None)
File "E:\Wan21_v3\Wan2.1\App.py", line 1209, in load_wan_pipeline
model_manager.load_models(
File "E:\Wan21_v3\Wan2.1\DiffSynth-Studio\diffsynth\models\model_manager.py", line 426, in load_models
self.load_model(file_path, model_names, device=device, torch_dtype=torch_dtype)
File "E:\Wan21_v3\Wan2.1\DiffSynth-Studio\diffsynth\models\model_manager.py", line 402, in load_model
state_dict.update(load_state_dict(path))
File "E:\Wan21_v3\Wan2.1\DiffSynth-Studio\diffsynth\models\utils.py", line 69, in load_state_dict
return load_state_dict_from_bin(file_path, torch_dtype=torch_dtype)
File "E:\Wan21_v3\Wan2.1\DiffSynth-Studio\diffsynth\models\utils.py", line 83, in load_state_dict_from_bin
state_dict = torch.load(file_path, map_location="cpu", weights_only=False)
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\torch\serialization.py", line 1549, in load
return _legacy_load(
File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\torch\serialization.py", line 1797, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x04'.