Todd Mostak

Results 13 issues of Todd Mostak

# Merge Checklist ## :wrench: Issue(s) fixed: - [ ] Author referenced issue(s) fixed by this PR: - [ ] Fixes #0 ## :smoking: Smoke Test - [ ] Works...

### 🚀 Feature Support for specification of a minimum learning rate ### Motivation Often in the research literature minimum learning rates are set when fine-tuning a model using a cosine...

type/feature

### 🚀 Feature Add the capability to the UI to kick off a grid-search over a set of hyperparameters (with specified search increments for continuous parameters, and specified attributes for...

type/feature

As a user I should only pay a performance hit on charts I am actively looking at. In particular, the hashtag query is currently expensive as the Twitter dataset gets...

enhancement

Per the [recent paper from Meta](https://arxiv.org/abs/2404.19737), it appears that models that predict multiple future tokens can exhibit significantly greater sample efficiency than models trained only on next-token prediction, plus the...

### 🐛 Bug Native bfloat16 model fine-tuned with bfloat16 gets pushed to HuggingFace as float16 ### To Reproduce 1. Choose a HF model like [Llama-3](https://huggingface.co/meta-llama/Meta-Llama-3-8B) with weights natively as bfloat16...

type/bug

### 🚀 Feature Allow epoch to be optionally used as the x-axis of training/eval charts for easier comparison between runs with different amounts of training pairs. ### Motivation I'm often...

type/feature

Thank you for all your work on this project, it's really great to have a fully OSS Llama backbone. I was excited to see the V2 version of the models...

### 🐛 Bug Today when attempting to upload a LoRA-trained Llama 3.1 70B model (first time I've trained Llama 3.1), I hit the following during the eLoRA merge. Note I...

type/bug

### 🐛 Bug When uploading a model to HuggingFace and using the `cpu_shard` setting, and I believe any available GPUs, allocations are left resident in GPU memory after upload. This...

type/bug