bjelkenhed comments

Repositories
Issues
Comments

Results 3 comments of


                                            bjelkenhed

Am I using the code incorrectly? help me

Thank you for providing this code, however I get the same results that is described by Pang-dachu above. The trained LoRA adapters does not seem to have any effect on...

Am I using the code incorrectly? help me

Thank you @Rinatum for all your suggestions. Trying now with something similar as @Pang-dachu without Deepspeed and using the MistralForSequenceEmbedding when loading the model and it looks promising so far....

Am I using the code incorrectly? help me

Hi, here are some updates from me. Without Deepspeed ZeRO-3 it works much better and no LoRA layers get zeros. That makes the training work as expected and the results...