DiffSynth-Studio icon indicating copy to clipboard operation
DiffSynth-Studio copied to clipboard

Tried to implement first block cache from paraattention got slight improvment in speed.

Open testdummyvt opened this issue 10 months ago • 8 comments

Hello, I tried to implement FBcache, which is implemented in ParaAttention.

My repo: diffsynth-wan-cache

I was able to improve my speed without much loss of visual quality.

I hope you guys try it out. Let me know if it works for you and if any changes need to be made.

I could not run I2V on my setup and had some issues loading the models. If someone can try it, please do.

Wan 2.1 T2V 14B

Ran it on RTX 4090 24GB VRAM

No FBCache 0.04 FBCache 0.07 FBCache 0.09 FBCache
1216.17 sec 1090.77 sec (1.11X) 657.64 sec (1.85X) 359.47 sec (Not usable)
Videos are available at assets.

No FBCache result

https://github.com/user-attachments/assets/c9ca618b-9295-496a-a10b-be6f4655f963

0.07 FBCache result

https://github.com/user-attachments/assets/08937e38-9bce-47fd-b330-95ac4bf348d5

testdummyvt avatar Mar 12 '25 21:03 testdummyvt

any chance you could implement official teacache?

this looks like a big quality drop @testdummyvt

also if you were repo were fork of DiffSynth-Studio i was gonna test it

FurkanGozukara avatar Mar 12 '25 21:03 FurkanGozukara

@FurkanGozukara I will look into teacahe, will update I am able to.

also if you were repo were fork of DiffSynth-Studio i was gonna test it

Once you install DiffSynth-Studio, you can run the py files from my repo. It just calls diffsythn. Same as examples/wanvideo files, but with added fbcache code.

testdummyvt avatar Mar 12 '25 21:03 testdummyvt

@FurkanGozukara I will look into teacahe, will update I am able to.

also if you were repo were fork of DiffSynth-Studio i was gonna test it

Once you install DiffSynth-Studio, you can run the py files from my repo. It just calls diffsythn. Same as examples/wanvideo files, but with added fbcache code.

I have my own app implementation. I use DiffSynth-Studio to load and inference in my app

FurkanGozukara avatar Mar 12 '25 21:03 FurkanGozukara

@FurkanGozukara @testdummyvt We are very cautious about adopting inference acceleration technologies. "There is no such thing as a free lunch." Almost all similar technologies come at the cost of visual quality, and we are still conducting experiments to compare the effectiveness of these technologies.

Artiprocher avatar Mar 14 '25 01:03 Artiprocher

@FurkanGozukara @testdummyvt We are very cautious about adopting inference acceleration technologies. "There is no such thing as a free lunch." Almost all similar technologies come at the cost of visual quality, and we are still conducting experiments to compare the effectiveness of these technologies.

100% that is the case. DiffSynth-Studio currently supports GGUF? I tested single unified DiT for Wan 2.1 and it worked. but I haven't tested GGUF yet planning to test. People telling me that GGUF fits into 32 GB RAM and 16 GB VRAM for 14b models and requesting me to support

Like Q6 GGUF Wan 2.1

GGUF models here @Artiprocher : https://huggingface.co/city96/Wan2.1-T2V-14B-gguf/tree/main

FurkanGozukara avatar Mar 14 '25 08:03 FurkanGozukara

@FurkanGozukara Hello. Did you test it? Whether GGUF is supported

fkjkey avatar Mar 17 '25 03:03 fkjkey

@FurkanGozukara Hello. Did you test it? Whether GGUF is supported

@FurkanGozukara Hello. Did you test it? Whether GGUF is supported

sadly failed @Artiprocher

WAN 2.1 14B Text-to-Video
[CMD] Loading model: 14B_text with torch dtype: torch.bfloat16 and num_persistent_param_in_dit: 4250000000
Loading models from: ['models\\Wan-AI\\Wan2.1-T2V-14B\\wan2.1-t2v-14b-Q6_K.gguf']
Traceback (most recent call last):
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\queueing.py", line 625, in process_events
    response = await route_utils.call_process_api(
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\blocks.py", line 2103, in process_api
    result = await self.call_function(
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\blocks.py", line 1650, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 2461, in run_sync_in_worker_thread
    return await future
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 962, in run
    result = context.run(func, *args)
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\utils.py", line 890, in wrapper
    response = f(*args, **kwargs)
  File "E:\Wan21_v3\Wan2.1\App.py", line 749, in generate_videos
    loaded_pipeline = load_wan_pipeline(model_choice, torch_dtype, vram_value, lora_path=effective_loras, lora_alpha=None)
  File "E:\Wan21_v3\Wan2.1\App.py", line 1209, in load_wan_pipeline
    model_manager.load_models(
  File "E:\Wan21_v3\Wan2.1\DiffSynth-Studio\diffsynth\models\model_manager.py", line 426, in load_models
    self.load_model(file_path, model_names, device=device, torch_dtype=torch_dtype)
  File "E:\Wan21_v3\Wan2.1\DiffSynth-Studio\diffsynth\models\model_manager.py", line 402, in load_model
    state_dict.update(load_state_dict(path))
  File "E:\Wan21_v3\Wan2.1\DiffSynth-Studio\diffsynth\models\utils.py", line 69, in load_state_dict
    return load_state_dict_from_bin(file_path, torch_dtype=torch_dtype)
  File "E:\Wan21_v3\Wan2.1\DiffSynth-Studio\diffsynth\models\utils.py", line 83, in load_state_dict_from_bin
    state_dict = torch.load(file_path, map_location="cpu", weights_only=True)
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\torch\serialization.py", line 1548, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Unsupported operand 4

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.

FurkanGozukara avatar Mar 18 '25 14:03 FurkanGozukara

I changed pickle loading in utils here new error @Artiprocher

WAN 2.1 14B Text-to-Video
[CMD] Loading model: 14B_text with torch dtype: torch.bfloat16 and num_persistent_param_in_dit: 4250000000
Loading models from: ['models\\Wan-AI\\Wan2.1-T2V-14B\\wan2.1-t2v-14b-Q6_K.gguf']
Traceback (most recent call last):
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\queueing.py", line 625, in process_events
    response = await route_utils.call_process_api(
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\blocks.py", line 2103, in process_api
    result = await self.call_function(
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\blocks.py", line 1650, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 2461, in run_sync_in_worker_thread
    return await future
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 962, in run
    result = context.run(func, *args)
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\gradio\utils.py", line 890, in wrapper
    response = f(*args, **kwargs)
  File "E:\Wan21_v3\Wan2.1\App.py", line 749, in generate_videos
    loaded_pipeline = load_wan_pipeline(model_choice, torch_dtype, vram_value, lora_path=effective_loras, lora_alpha=None)
  File "E:\Wan21_v3\Wan2.1\App.py", line 1209, in load_wan_pipeline
    model_manager.load_models(
  File "E:\Wan21_v3\Wan2.1\DiffSynth-Studio\diffsynth\models\model_manager.py", line 426, in load_models
    self.load_model(file_path, model_names, device=device, torch_dtype=torch_dtype)
  File "E:\Wan21_v3\Wan2.1\DiffSynth-Studio\diffsynth\models\model_manager.py", line 402, in load_model
    state_dict.update(load_state_dict(path))
  File "E:\Wan21_v3\Wan2.1\DiffSynth-Studio\diffsynth\models\utils.py", line 69, in load_state_dict
    return load_state_dict_from_bin(file_path, torch_dtype=torch_dtype)
  File "E:\Wan21_v3\Wan2.1\DiffSynth-Studio\diffsynth\models\utils.py", line 83, in load_state_dict_from_bin
    state_dict = torch.load(file_path, map_location="cpu", weights_only=False)
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\torch\serialization.py", line 1549, in load
    return _legacy_load(
  File "E:\Wan21_v3\Wan2.1\venv\lib\site-packages\torch\serialization.py", line 1797, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x04'.

FurkanGozukara avatar Mar 18 '25 14:03 FurkanGozukara