Baijiong Lin
Baijiong Lin
Yep, I also find this question. It is because the ``out_proj`` in ``MultiheadAttention`` is ``NonDynamicallyQuantizableLinear`` rather than a simple ``Linear`` layer. https://github.com/pytorch/pytorch/blob/dbb96ef30da4e50bdbecb56dfb9b2c43b8a39e9d/torch/nn/modules/activation.py#L1008
Sounds like a good idea. I will try to fix it (maybe after two weeks, i am busy with some ddls currently).
@mounchiliu Thanks for your suggestion. I have fixed this problem.
@marcomistretta @ghost Sorry for the late reply. The LoRA of out_proj is not updated because of the wrong init of the LoRA rather than the use of NonDynamicallyQuantizableLinear. I have...
I suggest using our re-implementation https://github.com/median-research-group/LibMTL/blob/main/LibMTL/weighting/Aligned_MTL.py.
the same issue
share_expert_gate指的是什么
(1) 我们的实现应该是少了shared expert gate. 后续我会更正这个错误. 感谢你指出这个问题. (2) 这里只是实例化一个encoder https://github.com/median-research-group/LibMTL/blob/45705f2dbc6118b07ff78dfc6425cacf47c8f740/LibMTL/architecture/PLE.py#L110 这里才会调用``_transform_resnet_PLE`` 的forward https://github.com/median-research-group/LibMTL/blob/45705f2dbc6118b07ff78dfc6425cacf47c8f740/LibMTL/architecture/PLE.py#L115 (3) 是的,我们这里用的是resnet,所以设定为5层
I first run the following command to generate codes on the imagenet dataset `torchrun --nproc_per_node 2 autoregressive/train/extract_codes_c2i.py --vq-model VQ-16 --vq-ckpt ./vq_ds16_c2i.pt --data-path xxx --code-path xxx --image-size 256` and then run...