jacobfulano
jacobfulano
@mvpatel2000 this is relevant and the branch is updated. Please feel free to approve!
We have not battle tested MPT-7B finetuning on g5.12xlarge instance (A10) instances, as most of our internal benchmarking was done on A100s. You might find this thread helpful #82; they...
Are you planning on adding a jsonl file with a generated dataset, as well as a yaml to make it part of our ICL suite?
Hi @NarenZen, can you provide more details on your hardware and docker image? Have you successfully saved checkpoints? Are you using multiple nodes?
This is a great PR! We are looking this over and will get back to you with more detailed questions/requests soon
A quick fix here is to do get the config and then pass it in to `AutoModelForMaskedLM.from_pretrained` ```python import torch import transformers from transformers import AutoModelForMaskedLM, BertTokenizer, pipeline from transformers...
Thanks for catching this! You are correct in using the `amp_bf16` flag. We are updating the yamls accordingly
Hi @stefan-it we did not experiment with training on 128 then switching to 512 (as in the original BERT paper by Devlin et al. 2018). In our experiments, training MosaicBERT-Base...
Hi @mmarius, we did not specifically train MosaicBERT-Large with sequence length 512 with batch size 4096 for 70,000 steps. However my estimate would be roughly 4x the time it takes...
If you are going any larger than that I would recommend looking at the [mosaicml/llm-foundry](https://github.com/mosaicml/llm-foundry) which should have support for training encoders/embedding models soon.