John Lamprou comments

Results 5 comments of


                                            John Lamprou

Model generating random sequence

LoRa is not compatible with nn.Parameter so you can't train the gate with LoRa, you can switch it to nn.Embedding which works with LoRa but need a little modification on...

Custom attention bias

I am currently testing this, i have implemented it for [switchtransformers](https://huggingface.co/docs/transformers/main/en/model_doc/switch_transformers#switchtransformers), and an NVIDIA A100 40GB. Training at the MLM task with an LR=1.5e-4. Loss scaling is very unstable. Lowering...

Custom attention bias

@b-albar Sorry i didn't clarify, im using bf16, but the Switch architecture is tricky too with mixed-precision (they use selective mixed precision so some layers are bf16 and others are...

Custom attention bias

> > @b-albar Sorry i didn't clarify, im using bf16, but the Switch architecture is tricky too with mixed-precision (they use selective mixed precision so some layers are bf16 and...

Support for Jamba (ai21labs/Jamba-v0.1)

+1 > [Jamba](https://huggingface.co/ai21labs/Jamba-v0.1) is a very interesting new model and I’d love to add support for galore for finetuning it. It’s an MoE+Transformer+Mamba hybrid so I’m not sure how that...