[Stable Video Diffusion] parameters meaning
Hello! Thank you for great work! It would be great if you could add some readme on the parameters meaning in stable video diffusion input. as I can guess cond_aug is amount of noise added to each frame and also there are 127 variations of motion_bucket_id - what means each value?
Tested all buckets on the same input image and had only 2 different movements. Moreover if I do image 2 video in the same script twice I get different movement types for 2 passes with the same motion_bucket_id.
motion_bucket_id (int, *optional*, defaults to 127):
The motion bucket ID. Used as conditioning for the generation. The higher the number the more motion will be in the video.
This is the annotation in diffusers. Hope it works for you.
In the diffusers code I scrutinized the, fps, bucket, aug parameters. I found that its role is on timestep. First the code packages the above parameters together in the parameter add_time_ids, by using the function _get_add_time_ids. then passes in unet. does something like the following time_embedding (ps,aug,buket) + time_embedding(timestep). Added the value of time_embedding. In the ddpm algorithm, timestep tells the model which step (0-999) this is, and a larger timestep means that the model will consider the input z to have more noise at this step. This equates to an increase in noise reduction each time, resulting in a large difference from the original image. I think that's what these parameters do. As for why there is a bucket parameter, I see that the svd paper performs a streaming operation on the dataset to compute the degree of video motion k, and this bucket_id may be a proxy for the degree of motion. When training svd, it is paired with the streaming motion degree k of the training data for training.
motion_bucket_id (int, *optional*, defaults to 127):The motion bucket ID. Used as conditioning for the generation. The higher the number the more motion will be in the video.This is the annotation in diffusers. Hope it works for you.
Thank you this is what lies on the surface.
In the diffusers code I scrutinized the, fps, bucket, aug parameters. I found that its role is on timestep. First the code packages the above parameters together in the parameter add_time_ids, by using the function _get_add_time_ids. then passes in unet. does something like the following time_embedding (ps,aug,buket) + time_embedding(timestep). Added the value of time_embedding. In the ddpm algorithm, timestep tells the model which step (0-999) this is, and a larger timestep means that the model will consider the input z to have more noise at this step. This equates to an increase in noise reduction each time, resulting in a large difference from the original image. I think that's what these parameters do. As for why there is a bucket parameter, I see that the svd paper performs a streaming operation on the dataset to compute the degree of video motion k, and this bucket_id may be a proxy for the degree of motion. When training svd, it is paired with the streaming motion degree k of the training data for training.
Thank you this explains the processes and also explains that it only controls the amount of movement. I hoped to control the type of movement, but unfortunately this param can't do it. But maybe using the piece of code you've shown we can copy the previous movement and use it next time...
From name of parameter motion_bucket_id i can guess that they sort videos in dataset by motion magnitude. So every motion_bucket_id refers to some kind of motion. Somewhere in 1..300? there is motion of camera from left to right, somewhere right to left and so on. If they clustered motions. It will be great that somebody from stability.ai shared samples from each bucket for using as map.
