Todd Mostak
Todd Mostak
# Merge Checklist ## :wrench: Issue(s) fixed: - [ ] Author referenced issue(s) fixed by this PR: - [ ] Fixes #0 ## :smoking: Smoke Test - [ ] Works...
### 🚀 Feature Support for specification of a minimum learning rate ### Motivation Often in the research literature minimum learning rates are set when fine-tuning a model using a cosine...
### 🚀 Feature Add the capability to the UI to kick off a grid-search over a set of hyperparameters (with specified search increments for continuous parameters, and specified attributes for...
As a user I should only pay a performance hit on charts I am actively looking at. In particular, the hashtag query is currently expensive as the Twitter dataset gets...
Per the [recent paper from Meta](https://arxiv.org/abs/2404.19737), it appears that models that predict multiple future tokens can exhibit significantly greater sample efficiency than models trained only on next-token prediction, plus the...
### 🐛 Bug Native bfloat16 model fine-tuned with bfloat16 gets pushed to HuggingFace as float16 ### To Reproduce 1. Choose a HF model like [Llama-3](https://huggingface.co/meta-llama/Meta-Llama-3-8B) with weights natively as bfloat16...
### 🚀 Feature Allow epoch to be optionally used as the x-axis of training/eval charts for easier comparison between runs with different amounts of training pairs. ### Motivation I'm often...
Thank you for all your work on this project, it's really great to have a fully OSS Llama backbone. I was excited to see the V2 version of the models...
### 🐛 Bug Today when attempting to upload a LoRA-trained Llama 3.1 70B model (first time I've trained Llama 3.1), I hit the following during the eLoRA merge. Note I...
### 🐛 Bug When uploading a model to HuggingFace and using the `cpu_shard` setting, and I believe any available GPUs, allocations are left resident in GPU memory after upload. This...