千橙

Results 11 comments of 千橙

如果了解一点Spring Boot 不妨看看Java的实现:https://github.com/iqiancheng/sparrow-recsys-spring-boot 欢迎探讨

如果了解一点Spring Boot 不妨看看Java的实现:https://github.com/iqiancheng/sparrow-recsys-spring-boot 欢迎探讨

``` pip install modelscope==1.29.0 ``` I reverted the modelscope version to 1.29.0, it works.Hope this helps

Hi everyone, I've implemented a script to merge the LoRA adapter weights from `adapter_model.pt` into the base model, covering all linear layers: `['q_proj', 'k_proj', 'v_proj', 'output_proj', 'w1', 'w2', 'w3', 'output']`....

> Hi [@iqiancheng](https://github.com/iqiancheng) , > > We have initial support for Qwen3-VL-30B-A3B with FSDP2 (please see [here](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/vlm_finetune/qwen3) for recipes), and we are planning to also support the 235B variant. >...

hi~ @smallscientist1 Regarding the combinations of qk and v dimensions you've implemented in FlashAttention-2, which configuration have you found to offer the best balance between performance and model effectiveness? Specifically,...

I found related upstream issues: [Implement aten.select.int sharding strategy](https://github.com/pytorch/pytorch/pull/149842) [[DTensor] [distributed]: Operator aten.select.int does not have a sharding strategy registered · Issue #147724 · pytorch/pytorch](https://github.com/pytorch/pytorch/issues/147724) but pytorch PR merged failed

> Hi! A little bit off but I am just a little curious about the tp_plan of qwen3 since I cannot find this model component under ./torchtune/models/qwen2 or qwen3. Did...

👍 Excited to know what the next WIP project will be called?