Chime Ogbuji
Chime Ogbuji
# M1 Ultra 128GB # transformer_lm % python main.py Training a transformer with 153.883 M parameters Iter 10: Train loss 8.963, It/sec 0.347 Iter 20: Train loss 8.379, It/sec 0.354...
If just training on raw corpus, I have been using the raw text, per lora.py. For instruction datasets, I have been using the Mistral prompt format surrounding the input before...
The recent changes to allow the *loss* and *iterate_batches* functions to be specified for the tuning process have made doing this a lot more straightforward to do. I have done...
This is a great idea and another thing I wouldn't have to roll my own version of. The only thing I would add is a request for SGDR (see [cyclic-cosine-decay](https://github.com/abhuse/cyclic-cosine-decay))...
Another pass at separating model-specific bits from training logic. Still keeping an eye on #213 to see if there is any synergy
> Just pushed (proposed) final version of #213 . Take a look and let me know how I can help utilize our changes together! That would be fantastic! Sorry I...
> Is that the default `train.jsonl` or a custom one? You should split those lines o/w they will consume a ton of memory. See the section on [reducing memory use](https://github.com/ml-explore/mlx-examples/tree/main/lora#Memory-Issues)....
#235 is dated. I can rebase it to mlx-examples/main and update it (to support the parameters that have been added since I last worked on that PR) if there is...
> I think a more sustainable way to do this is the following: > > * Have a field in the Yaml which gives the layers keys to apply LoRA...
Ok. I have incorporated that also into this PR