tigerzjh comments

Results 15 comments of


                                            tigerzjh

Tag Augmentation of Images

thanks for your answer. 1.You said many tags ares are non-leaf. Does that means one image may be included in multi class(ancestor and subclass)? 2.You said If any tag of...

Tag Augmentation of Images

I have read the whole paper. The transfer experience said the performance using only Tencent ML-image is worse than only image net, while Tencent ML-image and image net is better...

Mac press button， the button color change slowly even nothing。

the solution is update the qt verison to: PyQt5-5.15.4 PyQt5-Qt5-5.15.2 PyQt5-sip-12.8.1

Fine-tuning InstructBLIP?

same question,how to fine tune instruct blip?

知识蒸馏后的效果

这个对比图也有点疑问： 1、这个是先检测后调用的模型？ 2、我们有试用蒸馏resnet50 吗？ 3、我看我们不同模型最终特征的维度不一样，也可以蒸馏吗resnet50 1024、ViT-B-16 512, ViT-L-14 768？

pre-train 和 finetune 阶段，swin-transformer、CLIP text encoder和 tag embedding 都反传梯度吗，学习率也都是相同的吗？

@xinyu1205 文章里我看只有三个散落的地方提到梯度回传（哪些部分参数要训练）： 1、“figure 3”画了Text encoder 是 frozen 2、《A. More Implementation Details》章节“we employ the CLIP image encoder paired with the frozen text encoder to distill image feature, making full use of...

pre-train 和 finetune 阶段，swin-transformer、CLIP text encoder和 tag embedding 都反传梯度吗，学习率也都是相同的吗？

两个必须都看了，这个工作真的很顶，哈哈

pre-train 和 finetune 阶段，swin-transformer、CLIP text encoder和 tag embedding 都反传梯度吗，学习率也都是相同的吗？

@xinyu1205 RAM ++ 对比RAM 主要改进在于： * 文本不再用可学习的query，改成了GPT写的句子，然后用CLIP 文本编码器编码（训练&测试） * 整个句子的损失不在是生成损失，变为了ASL损失，整个模型进行了精简。 RAM 的Image Tag recognize decoder 和 RAM ++ 的alignment decoder 几乎是参数量、结构啥相同？可以这么理解？

pre-train 和 finetune 阶段，swin-transformer、CLIP text encoder和 tag embedding 都反传梯度吗，学习率也都是相同的吗？

感谢大佬 @xinyu1205