DidiD1
DidiD1
I guess the author use this package to compare with the performance of GAN-based method in the code, so if you just want to run AnoDDPM, you can delete all...
the code i use: ```bash python evaluate.py \ --videos_path ./VBench/11_29_3s \ --dimension "motion_smoothness" \ --mode "custom_input" ``` and the output be like: { "subject_consistency": [ 0.0, [ { "video_path": xxx,...
> same problem here #87 I track the bug found a strange thing is that, the image_features of the clip is all 0, which leads to the result of compute_background_consistency...
This phenomenon was mentioned in the SD3 paper,maybe why they proposed 'mode sampling with heavy-tails' time-sampling method. However it's strange that in their experiment results 'log-norm' is much better the...
Thanks a lot. And for my question3: "when we use logit_normal, it based on the RF-setting. So the weight of the loss should be t/(1-t), but the code doesn't compute...
> > currently we're using sigmoid sampling for timesteps which seems fine but no one has really ablated whether it leaves fine details out > > Actually, sigmoid and lognorm...
> What's this cache aiming for? Does it mean I can call the encode multiple times (split on the n_frame dimension) to lower maximum GPU memory requirements while getting the...
> 如题,inference的时候,num_frames只能设置为49吗,能否选择更短的帧数呢?我尝试的生成结果会有些问题 好像t2v可以换帧数,但是i2v就不行,我猜可能跟i2v的learnable_pos_embed有关系?
i also find this question, it shows that the encode embedding of the video is 0, so strange
3D full attention和2D+1D可以理解为patch化的方式不一样吧,3D full就是直接3个维度全都patch了,一个patch就是2*2*1(h*w*t),分开attention是分别保留了时间和空间维度信息的。