Baijiong Lin comments

Results 11 comments of


                                            Baijiong Lin

MultiheadAttention out_projection

Yep, I also find this question. It is because the ``out_proj`` in ``MultiheadAttention`` is ``NonDynamicallyQuantizableLinear`` rather than a simple ``Linear`` layer. https://github.com/pytorch/pytorch/blob/dbb96ef30da4e50bdbecb56dfb9b2c43b8a39e9d/torch/nn/modules/activation.py#L1008

MultiheadAttention out_projection

Sounds like a good idea. I will try to fix it (maybe after two weeks, i am busy with some ddls currently).

MultiheadAttention out_projection

@mounchiliu Thanks for your suggestion. I have fixed this problem.

MultiheadAttention out_projection

@marcomistretta @ghost Sorry for the late reply. The LoRA of out_proj is not updated because of the wrong init of the LoRA rather than the use of NonDynamicallyQuantizableLinear. I have...

Two errors about the code

I suggest using our re-implementation https://github.com/median-research-group/LibMTL/blob/main/LibMTL/weighting/Aligned_MTL.py.

triton error while running Mamba2 with slow path

the same issue

对PLE模型构架的一点疑问

share_expert_gate指的是什么

对PLE模型构架的一点疑问

(1) 我们的实现应该是少了shared expert gate. 后续我会更正这个错误. 感谢你指出这个问题. (2) 这里只是实例化一个encoder https://github.com/median-research-group/LibMTL/blob/45705f2dbc6118b07ff78dfc6425cacf47c8f740/LibMTL/architecture/PLE.py#L110 这里才会调用``_transform_resnet_PLE`` 的forward https://github.com/median-research-group/LibMTL/blob/45705f2dbc6118b07ff78dfc6425cacf47c8f740/LibMTL/architecture/PLE.py#L115 (3) 是的，我们这里用的是resnet，所以设定为5层

RuntimeError: shape '[1, 512, 1, 32, 2]' is invalid for input of size 16448

the same issue

RuntimeError: shape '[1, 512, 1, 32, 2]' is invalid for input of size 16448

I first run the following command to generate codes on the imagenet dataset `torchrun --nproc_per_node 2 autoregressive/train/extract_codes_c2i.py --vq-model VQ-16 --vq-ckpt ./vq_ds16_c2i.pt --data-path xxx --code-path xxx --image-size 256` and then run...