Baijiong Lin

Results 11 comments of Baijiong Lin

Yep, I also find this question. It is because the ``out_proj`` in ``MultiheadAttention`` is ``NonDynamicallyQuantizableLinear`` rather than a simple ``Linear`` layer. https://github.com/pytorch/pytorch/blob/dbb96ef30da4e50bdbecb56dfb9b2c43b8a39e9d/torch/nn/modules/activation.py#L1008

Sounds like a good idea. I will try to fix it (maybe after two weeks, i am busy with some ddls currently).

@mounchiliu Thanks for your suggestion. I have fixed this problem.

@marcomistretta @ghost Sorry for the late reply. The LoRA of out_proj is not updated because of the wrong init of the LoRA rather than the use of NonDynamicallyQuantizableLinear. I have...

I suggest using our re-implementation https://github.com/median-research-group/LibMTL/blob/main/LibMTL/weighting/Aligned_MTL.py.

share_expert_gate指的是什么

(1) 我们的实现应该是少了shared expert gate. 后续我会更正这个错误. 感谢你指出这个问题. (2) 这里只是实例化一个encoder https://github.com/median-research-group/LibMTL/blob/45705f2dbc6118b07ff78dfc6425cacf47c8f740/LibMTL/architecture/PLE.py#L110 这里才会调用``_transform_resnet_PLE`` 的forward https://github.com/median-research-group/LibMTL/blob/45705f2dbc6118b07ff78dfc6425cacf47c8f740/LibMTL/architecture/PLE.py#L115 (3) 是的,我们这里用的是resnet,所以设定为5层

I first run the following command to generate codes on the imagenet dataset `torchrun --nproc_per_node 2 autoregressive/train/extract_codes_c2i.py --vq-model VQ-16 --vq-ckpt ./vq_ds16_c2i.pt --data-path xxx --code-path xxx --image-size 256` and then run...