bjelkenhed
bjelkenhed
Thank you for providing this code, however I get the same results that is described by Pang-dachu above. The trained LoRA adapters does not seem to have any effect on...
Thank you @Rinatum for all your suggestions. Trying now with something similar as @Pang-dachu without Deepspeed and using the MistralForSequenceEmbedding when loading the model and it looks promising so far....
Hi, here are some updates from me. Without Deepspeed ZeRO-3 it works much better and no LoRA layers get zeros. That makes the training work as expected and the results...