TitleZ99

Results 14 issues of TitleZ99

Hi thanks for this wonderful work. I am confused about the CrossAttention Module, In the code of XBERT,when layer_num>=6, the text_encoder will turn into cross attention, however it will do...

Thanks for this great work and the open-soursed repo. I want to know how to visualize the result after the inference like you show in the repo in the end.

Thanks for this great work. I am wondering if you just randomly initialized the adaption prompts since you used zero-init attention in the L layer. I also think the multi-modal...

Thanks for this wonderful work. I have a question as the picture shown. When sr_ratio>1,you will do the conv first ,then plus original v and do attention function last. But...

Thanks for this wonderful work. This work is very inspiring. I am confused about how to get the heat-map as shown in your paper. Looking forward to your reply at...

Thanks for this inspired work. I am confused about why you choose Group Normalization as the normalization. Did you try other normalization methods like Batch Normalization? I think Batch Normalization...

Thanks for this great job and i'm wondering how to run inference in a 8GB single GPU,like your example showing in the readme. I tried it in my RTX2080ti with...

Thanks for this great work and open-sourced repo. I didn't touch the segment tasks before. I am wondering to know how to visualize the dense segment image like you show...

Has anyone successfully downloaded the Flickr30k dataset provided in this article? I used azcopy but it didn't work. I really need this dataset for some research. If you can, please...