DiffSynth-Studio
DiffSynth-Studio copied to clipboard
Fix wan T2V inference with larger batchsize than 1
Currently Wan2 T2V inference fails on batchsize larger than 1 due to
- Incompatiable shape between the time conditioning and the modulation tensor
- A bug in the text encoder that truncate the text to the smallest text token length available in the batch.
I introduced two small fixes but I am not sure if other inference/training behaviors are also affected (e.g I2V).