laoda513 comments

Results 23 comments of


                                            laoda513

[BUG] The text generated by the hybrid engine does not meet expectations, the model is GPTNeoX

https://github.com/microsoft/DeepSpeedExamples/issues/503 seems the same with this. I provided some detailed info

是否支持量化的模型呀？

看了论文，我理解下来这主要是一种训练时的优化器，对比传统的adam等优化器，大幅降低了训练时的内存开销。可以这么理解嘛？

not support lora with autogptq/peft?

> I'm not familiar with the format that AutoGPTQ produces LoRAs in. Whether it's supported or not depends on what the resulting tensors look like. If they're FP16 and they...

not support lora with autogptq/peft?

https://drive.google.com/drive/folders/1jQSPOb9i6QKH4kwmBcG4k71z_VC15hAF?usp=sharing 65b is too large so I make a 7b lora. And the log is : base_model.model.model.layers.0.self_attn.qkv_proj.lora_A.weight ## Error: unsupported layer in loras/7b__qLORA_adapter2/adapter_model.bin: base_model.model.model.layers.0.self_attn.qkv_proj.lora_A.weight

not support lora with autogptq/peft?

it seems fused_attn is root cause. If the model was loaded with fuse_attention enabled, the q_proj,k_proj,v_proj would be combined into qkv_proj.

not support lora with autogptq/peft?

well，I guess it's a bad idea to convert back. I'm not sure if it's possibile or profitable that exllama can support fused format itself, then people can just use fused...

NTK RoPE scaling.

> Okay, I did a experimental PR to see if turbo wants to add it, or maybe testing it via other way. > > #118 so for using this feature,...

NTK RoPE scaling.

how about the mem cost increase for inference and training? it is linear? for example 1 for 2k and 2 for 4k.. and i think this is very exciting and...

[Usage]: How to disable multi lora to avoid using punica ? Or is the punica being the only choice?

Thanks！ hmmmm, although， that sound complicate If I want to save multi lora copies 。。。else it would take too much dick space。。