John Lamprou
John Lamprou
LoRa is not compatible with nn.Parameter so you can't train the gate with LoRa, you can switch it to nn.Embedding which works with LoRa but need a little modification on...
I am currently testing this, i have implemented it for [switchtransformers](https://huggingface.co/docs/transformers/main/en/model_doc/switch_transformers#switchtransformers), and an NVIDIA A100 40GB. Training at the MLM task with an LR=1.5e-4. Loss scaling is very unstable. Lowering...
@b-albar Sorry i didn't clarify, im using bf16, but the Switch architecture is tricky too with mixed-precision (they use selective mixed precision so some layers are bf16 and others are...
> > @b-albar Sorry i didn't clarify, im using bf16, but the Switch architecture is tricky too with mixed-precision (they use selective mixed precision so some layers are bf16 and...
+1 > [Jamba](https://huggingface.co/ai21labs/Jamba-v0.1) is a very interesting new model and I’d love to add support for galore for finetuning it. It’s an MoE+Transformer+Mamba hybrid so I’m not sure how that...