tginart

Results 11 issues of tginart

I am trying to follow the Quickstart guide on the mosaicml/pytorch docker image and running into issues when trying the exact commands. The training step is broken. In particular, there...

Tried setting max_seq_len to 5k and added alibi to attn config with Triton flash attn. Otherwise using: scripts/train/yamls/finetune/mpt-7b_dolly_sft.yaml. Finetuning seems to be working but it does emit this sketchy log...

**Describe the bug** Running the Pythia-7B fine-tune script on 4 x A10 (24GB each). Seems like issue with seq len: _``` Token indices sequence length is longer than the specified...

When I run with an eval set, I only get metrics/eval. I am wondering if there is a way to configure llm-foundry via yaml to also compute loss/eval in the...

Not an issue, just a question. Does llm-foundry automatically handle eos tokens or should we manually add them into our text data to denote? For example, if we are loading...

TL;DR -- Fix error behavior for models initialized with 'cuda' device map # What does this PR do? If a model is initialized with a the 'cuda' device map, the...

### Describe the bug I have been running lm-eval-harness a lot which has results in an API rate limit. This seems strange, since all of the data should be cached...

Does torchtune support multi-node training? For example, in a SLURM environment? If so, would it be possible to get an example config?

discussion
distributed

My understanding is that the full multi-gpu fine-tuning doesn't yet support learning rate schedules. Would it be possible to add support for this? Even basic ones, such as linear warmup...

best practice
better engineering
triaged

Bug on installation: ``` (tginart-0001) aginart@ip-10-1-89-181:~/dev/fun_projects/generation_projects$ pip install torchao Requirement already satisfied: torchao in /fsx/home/aginart/miniconda3/envs/tginart-0001/lib/python3.11/site-packages (0.6.1) (tginart-0001) aginart@ip-10-1-89-181:~/dev/fun_projects/generation_projects$ tune --help Traceback (most recent call last): File "/fsx/home/aginart/miniconda3/envs/tginart-0001/lib/python3.11/site-packages/torchtune/__init__.py", line 16, in...