Zhenwei

Results 17 comments of Zhenwei

The pretrained model you provided(at http://nlp.cs.unc.edu/data/model_LXRT.pth) was trained after 20 epoch or 12 epoch? Can the 12-epoch pretrained model achieve 79% accuracy on VQA dataset?

Thanks for your reply. But I'm not sure if I've made my point clear. And, sorry, I still have some questions. > I assume the issue with `print(model.config.torch_dtype)` might be...

## platform MacBook Air (Retina, 13-inch, 2020), 1.1 GHz 4core Intel Core i5 OS: macOS Catalina, v10.15.7 (19H2) ## python version as you can see in the screenshot ## pip...

deploy视频输出还在debug中。(因为没有重新folk,files changed里面还能看到上次作业的文件更改,直接折叠看chapter3的文件就好,见谅)

> https://github.com/ParadoxZW/LLaVA-UHD-Better/blob/main/llava_uhd/adapt_llava.py#L136-L138 > > 这里由于The first token is for CLS,是不是需要把 > > ```python > m[:w * h] = True > ``` > > 改成 > > ```python > m[:w *...

> 以及check了下clip的code,attention mask在clip里应该-float('inf')才是表示mask而不是0或1表示mask? 这是我从`~/miniconda3/envs/llv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py`的`CLIPEncoder`类的`forward`函数的注释里拷贝出来的 ``` attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*): Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: - 1 for...

感谢您的关注与反馈。 P.S. 对于开放视觉encoder训练的支持,还有一点小问题,我会尽快更新(主要我这两天没卡了,训练推迟了)。如果您要跑实验,建议先不打开这个选项。

> > > 以及check了下clip的code,attention mask在clip里应该-float('inf')才是表示mask而不是0或1表示mask? > > > > > > 这是我从`~/miniconda3/envs/llv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py`的`CLIPEncoder`类的`forward`函数的注释里拷贝出来的 > > ``` > > attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*): > > Mask to avoid performing...

如果您愿意,你可以基于现在的代码版本提交您的pr修复上述问题,我可以进行merge。当然我来update也可以。