Text4Vis icon indicating copy to clipboard operation
Text4Vis copied to clipboard

【AAAI'2023 & IJCV】Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective

Results 8 Text4Vis issues
Sort by recently updated
recently updated
newest added

Thank you for your impressive work, Could you provide your pretrained model without text on HMDB as shown in Table 6? Thank you very much. Kind Regards,

你好!您的项目十分有趣,我想将其修改为回归任务应用在我的数据集上,请问是否可以修改呢?如果可以,具体该如何修改哪些部分呢?

您好,我对您的工作十分感兴趣,并且有两个问题想询问您。 1.您如何获取到Classifier(训练过程)即:如何通过Transferring visual statistic knowledge(LDA)得到lda_0.1.pt文件,以及如何通过Transferring textual semantic knowledge得到classes_features的训练过程 2.相关pt文件distilbert-base-k400.pt和lda_0.1.pt没有给出。 十分期待您的回信

May I ask how the CoOP in the paper is implemented? Is there a tutorial available?

作者您好,看了您的论文深受启发,觉得您写的很好,有两个问题想咨询您。 1、我已经成功复现了代码,预训练模型使用的vit-l-14,两张4090显卡跑的结果是:top1: 95.3%\top5: 99.2%,跟您的结果可能还有差距。 2、关于视觉特征和文本特征融合时,您采用了CLIP模型默认的余弦相似度计算,但我不太理解这个代码思路,看CLIP原论文伪代码好像不是这样,恳请您解答一下这个logit_scale 是干啥的,有什么用,为什么要这样初始化logit_scale 。 self.logit_scale = nn.Parameter(torch.ones([]) * np.log(1 / 0.07)) logit_scale = self.logit_scale.exp() logits = logit_scale * image_emb @ text_emb.t()

Which code should I run to get the published result? Also, I noticed that "train_nce.py" is quite similar to the code for [BIKE](https://github.com/whwu95). It would be helpful if you could...

While the links to GITHUB are available, all links to OneDrive are expired. But the training of HDMB51, UCF101 involves pre-trained ViT-L models, which are unavailable to access. Please extend...