DiffSynth-Studio
DiffSynth-Studio copied to clipboard
HunyuanVideo ValueError: Image features and image tokens do not match: tokens: 1, features 2359296
怎么解决呢
Traceback (most recent call last):
File "/root/paddlejob/workspace/env_run/zwr_workspace/DiffSynth-Studio/hunyuanvideo_i2v_24G.py", line 43, in <module>
video = pipe(prompt, input_images=images, num_inference_steps=50, seed=0, i2v_resolution=i2v_resolution)
File "/root/paddlejob/workspace/env_run/miniconda3/envs/Diff-S/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/paddlejob/workspace/env_run/zwr_workspace/DiffSynth-Studio/diffsynth/pipelines/hunyuan_video.py", line 190, in __call__
prompt_emb_posi = self.encode_prompt(prompt, positive=True, input_images=input_images)
File "/root/paddlejob/workspace/env_run/zwr_workspace/DiffSynth-Studio/diffsynth/pipelines/hunyuan_video.py", line 106, in encode_prompt
prompt_emb, pooled_prompt_emb, text_mask = self.prompter.encode_prompt(
File "/root/paddlejob/workspace/env_run/zwr_workspace/DiffSynth-Studio/diffsynth/prompters/hunyuan_video_prompter.py", line 288, in encode_prompt
prompt_emb, attention_mask = self.encode_prompt_using_mllm(prompt_formated, images, llm_sequence_length, device,
File "/root/paddlejob/workspace/env_run/zwr_workspace/DiffSynth-Studio/diffsynth/prompters/hunyuan_video_prompter.py", line 191, in encode_prompt_using_mllm
last_hidden_state = self.text_encoder_2(input_ids=input_ids,
File "/root/paddlejob/workspace/env_run/miniconda3/envs/Diff-S/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/paddlejob/workspace/env_run/miniconda3/envs/Diff-S/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/root/paddlejob/workspace/env_run/zwr_workspace/DiffSynth-Studio/diffsynth/models/hunyuan_video_text_encoder.py", line 63, in forward
outputs = super().forward(input_ids=input_ids,
File "/root/paddlejob/workspace/env_run/miniconda3/envs/Diff-S/lib/python3.10/site-packages/transformers/utils/generic.py", line 943, in wrapper
output = func(self, *args, **kwargs)
File "/root/paddlejob/workspace/env_run/miniconda3/envs/Diff-S/lib/python3.10/site-packages/transformers/models/llava/modeling_llava.py", line 455, in forward
outputs = self.model(
File "/root/paddlejob/workspace/env_run/miniconda3/envs/Diff-S/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/paddlejob/workspace/env_run/miniconda3/envs/Diff-S/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/root/paddlejob/workspace/env_run/miniconda3/envs/Diff-S/lib/python3.10/site-packages/transformers/utils/generic.py", line 943, in wrapper
output = func(self, *args, **kwargs)
File "/root/paddlejob/workspace/env_run/miniconda3/envs/Diff-S/lib/python3.10/site-packages/transformers/models/llava/modeling_llava.py", line 296, in forward
raise ValueError(
`ValueError: Image features and image tokens do not match: tokens: 1, features 2359296`
same problem
This problem can be resolved by degrading transformers. According to my test, transformers==4.45 will work.