MetaTransformer
MetaTransformer copied to clipboard
Meta-Transformer for Unified Multimodal Learning
In the sample code provided, features are concated before processed in the encoder. features = torch.concat([video_tokenizer(video), audio_tokenizer(audio), time_series_tokenizer(time_data)],dim=1) However, as I ran some tokenizers of different modaility, the tokenized shape...
First of all, congratulations for your work! I opened this issue to ask if you can upload the Data2Seq pre-trained weights, it could be very useful for many researchers. Thanks...
Hi, thanks for your outstanding work! I am trying to use meta-transformer to conduct image classification. I noticed that in the paper, you wrote "On image classification, with the help...
非常感谢您的杰出工作,我刚刚接触这方面的研究,读了您的论文后,收到很大的启发,但在使用X-ray代码时遇到了一些问题,我装好了一些库之后,却显示找不到models,这是什么原因呢?
https://github.com/invictus717/MetaTransformer/blob/b08a2bee6dae578bbbedd124859bfe4201181681/Data2Seq/Data2Seq.py#L52 in the code,embeddings is a list including input_ids and attention_mask,and cause error in function zero_padding