Vista Any plan to support fp16 / bf16 support?

Thank you for an amazing work!

I was trying to train Vista model with OpenDV-YouTube dataset (this is also a great work, thanks!) and found out that OOM sometimes happens in my environment, so I tried to modify the codebase to use bf16 and found out that it cannot be done by one liner change. I am wondering whether you have a plan to support bf16 / fp16 training, do you?

Again, thank you for a great work!

Jun 04 '24 10:06 koukyo1994

Thanks for your suggestion! We will actively optimize the memory usage, including mixed precision training. Before we make it, you can:

Try some memory efficient training techniques collected in Optix.
Train at a lower resolution (such as 320x576) since those techniques may require modifying multiple lines.

Also, if you find anything useful, welcome to help us in improving the code!

Jun 04 '24 11:06 Little-Podi

I didn't know Optix, it looks very helpful for my case... thank you!!!

Train at a lower resolution (such as 320x576)

The quality of generation is one big point of your method, so I was reluctant to do this(training with lower resolution images), but it seems I should at first try this one to see the impact of your method

Also, if you find anything useful, welcome to help us in improving the code!

Of course! Though it is not enough yet, I did try to modify several lines of the code to adapt to bf16 training so far. It looks deepspeed integration of pytorch lightning is experimental, and it requires us to manually cast a lot of tensors to half precision types. When I find a way to successfully do bf16 training, I will make a PR of it

Jun 04 '24 14:06 koukyo1994

By the way, you can disable deepspeed if it is an obstacle to bf16 training (just remove strategy: deepspeed_stage_2 in config). I use deepspeed mainly for reducing memory occupation, but bf16 training might be able to save more memory.

Jun 04 '24 14:06 Little-Podi