Visual Video DIT Attention map

Open Passenger12138 opened this issue 1 year ago • 0 comments

System Info / 系統信息

visualize attention maps for video generation models based on the Diffusers Transformer。

Information / 问题信息

[ ] The official example scripts / 官方的示例脚本
[X] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

1 ref paper ”https://arxiv.org/abs/2412.18597“ 2 code https://github.com/Passenger12138/attention-map-diffusers-vdm.git

Expected behavior / 期待表现

text-2-text attention map attention-t2t text-2-video attention map word "jacket" image video https://github.com/user-attachments/assets/8ae0f67e-abdb-4aa4-bb65-95b31feae222

Jan 10 '25 08:01 Passenger12138