tigerzjh

Results 15 comments of tigerzjh

thanks for your answer. 1.You said many tags ares are non-leaf. Does that means one image may be included in multi class(ancestor and subclass)? 2.You said If any tag of...

I have read the whole paper. The transfer experience said the performance using only Tencent ML-image is worse than only image net, while Tencent ML-image and image net is better...

the solution is update the qt verison to: PyQt5-5.15.4 PyQt5-Qt5-5.15.2 PyQt5-sip-12.8.1

same question,how to fine tune instruct blip?

这个对比图也有点疑问: 1、这个是先检测后调用的模型? 2、我们有试用蒸馏resnet50 吗? 3、我看我们不同模型最终特征的维度不一样,也可以蒸馏吗resnet50 1024、ViT-B-16 512, ViT-L-14 768?

可以用VIT 蒸馏resnet50吗,这种模型差异比较大的

@xinyu1205 文章里我看只有三个散落的地方提到梯度回传(哪些部分参数要训练): 1、“figure 3”画了Text encoder 是 frozen 2、《A. More Implementation Details》章节“we employ the CLIP image encoder paired with the frozen text encoder to distill image feature, making full use of...

@xinyu1205 RAM ++ 对比RAM 主要改进在于: * 文本不再用可学习的query,改成了GPT写的句子,然后用CLIP 文本编码器编码(训练&测试) * 整个句子的损失不在是生成损失,变为了ASL损失,整个模型进行了精简。 RAM 的Image Tag recognize decoder 和 RAM ++ 的alignment decoder 几乎是参数量、结构啥相同? 可以这么理解?