Can I adapt the model for video prediction(like moving mmnist)?
Thanks for the great work!
Is it possible to do adapt the model for video prediction? And if so, what decoder model shall I use? Thanks for any suggestions!
You can consider fine-tuning the stage 1 model in combination with videoMAEv2's decoder. These components closely resemble autoencoders and have the potential to predict frames. However, it's important to assess whether they align with your specific requirements.
You can consider fine-tuning the stage 1 model in combination with videoMAEv2's decoder. These components closely resemble autoencoders and have the potential to predict frames. However, it's important to assess whether they align with your specific requirements.
Thanks for your suggestions! I would give it a try.