tginart issues

Results 11 issues of


                                            tginart

Broken on docker image?

I am trying to follow the Quickstart guide on the mosaicml/pytorch docker image and running into issues when trying the exact commands. The training step is broken. In particular, there...

Finetune of scripts/train/yamls/finetune/mpt-7b_dolly_sft.yaml with alibi and triton emits sketchy log

Tried setting max_seq_len to 5k and added alibi to attn config with Triton flash attn. Otherwise using: scripts/train/yamls/finetune/mpt-7b_dolly_sft.yaml. Finetuning seems to be working but it does emit this sketchy log...

Token indices sequence length is longer than the specified maximum sequence length for this model (4158 > 2048)

**Describe the bug** Running the Pythia-7B fine-tune script on 4 x A10 (24GB each). Seems like issue with seq len: _``` Token indices sequence length is longer than the specified...

Configure eval to give 'loss/eval' that is analgous to 'loss/train'

When I run with an eval set, I only get metrics/eval. I am wondering if there is a way to configure llm-foundry via yaml to also compute loss/eval in the...

eos tokens

Not an issue, just a question. Does llm-foundry automatically handle eos tokens or should we manually add them into our text data to denote? For example, if we are loading...

Update accelerator.py

TL;DR -- Fix error behavior for models initialized with 'cuda' device map # What does this PR do? If a model is initialized with a the 'cuda' device map, the...

load_dataset ignores cached datasets and tries to hit HF Hub, resulting in API rate limit errors

### Describe the bug I have been running lm-eval-harness a lot which has results in an API rate limit. This seems strange, since all of the data should be cached...

Does torchtune support multi-node training?

Does torchtune support multi-node training? For example, in a SLURM environment? If so, would it be possible to get an example config?

discussion

distributed

adding support for LR schedule for full distributed finetune

My understanding is that the full multi-gpu fine-tuning doesn't yet support learning rate schedules. Would it be possible to add support for this? Even basic ones, such as linear warmup...

best practice

better engineering

triaged

BUG: Installation broken on torchao

Bug on installation: ``` (tginart-0001) aginart@ip-10-1-89-181:~/dev/fun_projects/generation_projects$ pip install torchao Requirement already satisfied: torchao in /fsx/home/aginart/miniconda3/envs/tginart-0001/lib/python3.11/site-packages (0.6.1) (tginart-0001) aginart@ip-10-1-89-181:~/dev/fun_projects/generation_projects$ tune --help Traceback (most recent call last): File "/fsx/home/aginart/miniconda3/envs/tginart-0001/lib/python3.11/site-packages/torchtune/__init__.py", line 16, in...