laoda513

Results 23 comments of laoda513

https://github.com/microsoft/DeepSpeedExamples/issues/503 seems the same with this. I provided some detailed info

另外有没有官方群呀?

看了论文,我理解下来这主要是一种训练时的优化器,对比传统的adam等优化器,大幅降低了训练时的内存开销。 可以这么理解嘛?

> I'm not familiar with the format that AutoGPTQ produces LoRAs in. Whether it's supported or not depends on what the resulting tensors look like. If they're FP16 and they...

https://drive.google.com/drive/folders/1jQSPOb9i6QKH4kwmBcG4k71z_VC15hAF?usp=sharing 65b is too large so I make a 7b lora. And the log is : base_model.model.model.layers.0.self_attn.qkv_proj.lora_A.weight ## Error: unsupported layer in loras/7b__qLORA_adapter2/adapter_model.bin: base_model.model.model.layers.0.self_attn.qkv_proj.lora_A.weight

it seems fused_attn is root cause. If the model was loaded with fuse_attention enabled, the q_proj,k_proj,v_proj would be combined into qkv_proj.

well,I guess it's a bad idea to convert back. I'm not sure if it's possibile or profitable that exllama can support fused format itself, then people can just use fused...

> Okay, I did a experimental PR to see if turbo wants to add it, or maybe testing it via other way. > > #118 so for using this feature,...

how about the mem cost increase for inference and training? it is linear? for example 1 for 2k and 2 for 4k.. and i think this is very exciting and...

Thanks! hmmmm, although, that sound complicate If I want to save multi lora copies 。。。else it would take too much dick space。。