tginart
tginart
I am trying to follow the Quickstart guide on the mosaicml/pytorch docker image and running into issues when trying the exact commands. The training step is broken. In particular, there...
Tried setting max_seq_len to 5k and added alibi to attn config with Triton flash attn. Otherwise using: scripts/train/yamls/finetune/mpt-7b_dolly_sft.yaml. Finetuning seems to be working but it does emit this sketchy log...
**Describe the bug** Running the Pythia-7B fine-tune script on 4 x A10 (24GB each). Seems like issue with seq len: _``` Token indices sequence length is longer than the specified...
When I run with an eval set, I only get metrics/eval. I am wondering if there is a way to configure llm-foundry via yaml to also compute loss/eval in the...
Not an issue, just a question. Does llm-foundry automatically handle eos tokens or should we manually add them into our text data to denote? For example, if we are loading...
TL;DR -- Fix error behavior for models initialized with 'cuda' device map # What does this PR do? If a model is initialized with a the 'cuda' device map, the...
### Describe the bug I have been running lm-eval-harness a lot which has results in an API rate limit. This seems strange, since all of the data should be cached...
Does torchtune support multi-node training? For example, in a SLURM environment? If so, would it be possible to get an example config?
My understanding is that the full multi-gpu fine-tuning doesn't yet support learning rate schedules. Would it be possible to add support for this? Even basic ones, such as linear warmup...
Bug on installation: ``` (tginart-0001) aginart@ip-10-1-89-181:~/dev/fun_projects/generation_projects$ pip install torchao Requirement already satisfied: torchao in /fsx/home/aginart/miniconda3/envs/tginart-0001/lib/python3.11/site-packages (0.6.1) (tginart-0001) aginart@ip-10-1-89-181:~/dev/fun_projects/generation_projects$ tune --help Traceback (most recent call last): File "/fsx/home/aginart/miniconda3/envs/tginart-0001/lib/python3.11/site-packages/torchtune/__init__.py", line 16, in...