SwiftNet
SwiftNet copied to clipboard
About the motion model to generate the simulated images in pre-training
Thanks for sharing your amazing work! I have some questions about the pre-training stage. In Sec 4.2.1 of your paper, you mention that 'we maintain an implicit motion model to generate clips with length of 5.' This part seems different from the pre-training of STM. So what is the motion model and how does it works?