DidiD1

Results 11 comments of DidiD1

I guess the author use this package to compare with the performance of GAN-based method in the code, so if you just want to run AnoDDPM, you can delete all...

the code i use: ```bash python evaluate.py \ --videos_path ./VBench/11_29_3s \ --dimension "motion_smoothness" \ --mode "custom_input" ``` and the output be like: { "subject_consistency": [ 0.0, [ { "video_path": xxx,...

> same problem here #87 I track the bug found a strange thing is that, the image_features of the clip is all 0, which leads to the result of compute_background_consistency...

This phenomenon was mentioned in the SD3 paper,maybe why they proposed 'mode sampling with heavy-tails' time-sampling method. However it's strange that in their experiment results 'log-norm' is much better the...

Thanks a lot. And for my question3: "when we use logit_normal, it based on the RF-setting. So the weight of the loss should be t/(1-t), but the code doesn't compute...

> > currently we're using sigmoid sampling for timesteps which seems fine but no one has really ablated whether it leaves fine details out > > Actually, sigmoid and lognorm...

> What's this cache aiming for? Does it mean I can call the encode multiple times (split on the n_frame dimension) to lower maximum GPU memory requirements while getting the...

> 如题,inference的时候,num_frames只能设置为49吗,能否选择更短的帧数呢?我尝试的生成结果会有些问题 好像t2v可以换帧数,但是i2v就不行,我猜可能跟i2v的learnable_pos_embed有关系?

i also find this question, it shows that the encode embedding of the video is 0, so strange

3D full attention和2D+1D可以理解为patch化的方式不一样吧,3D full就是直接3个维度全都patch了,一个patch就是2*2*1(h*w*t),分开attention是分别保留了时间和空间维度信息的。