AnimateDiff icon indicating copy to clipboard operation
AnimateDiff copied to clipboard

Why encoder_hidden_state is used in the motion module?

Open junwenxiong opened this issue 1 year ago • 3 comments

Can you explain why encoder_hidden_state is used in the motion module? The motion module as expressed in the paper is a vanilla temporal attention, not cross-attention. image https://github.com/guoyww/AnimateDiff/blob/cf80ddeb47b69cf0b16f225800de081d486d7f21/animatediff/models/unet_blocks.py#L411

junwenxiong avatar Feb 29 '24 14:02 junwenxiong

As I looked Inside of the motion module's attention (VersatileAttention), encoder_hidden_states become the hidden_states, so ultimately the attention operates as self-attention. In other words, encoder_hidden_states is not used in the motion module's attention.

Taeu avatar Mar 06 '24 05:03 Taeu

But it seems that these do not prove that the encoder_hidden_states are none and replaced by the hidden_states.

junwenxiong avatar Mar 06 '24 05:03 junwenxiong

You should check the TemporalTransformerBlock in motion_module.py. When creating VersatileAttention block, the cross_attention_dim is None since attention_block_types are [ "Temporal_Self", "Temporal_Self" ]. Then you can find that the encoder_hidden_states are none and replaced by the hidden_states.

Taeu avatar Mar 06 '24 07:03 Taeu