3 and more modalities in one model

Open vzapylikhin opened this issue 2 years ago • 1 comments

Hello! Your transformer is amazing! But i m beginner in data science. I have to do research for my university task: we want to predict how negotiations will finish. We have various modalities including video, audio, time-series EEG. Maybe you have demo version how to use transformer for such tasks? If you do, please share it. Thanks!

Sep 30 '23 10:09 vzapylikhin

hi, here to recommend our work, which is LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment. We provide online demos. we open source all training and validation code.

Oct 16 '23 01:10 LinB203