jacobfulano comments

Results 13 comments of


                                            jacobfulano

Update checks for Gated Linear Units Method

@mvpatel2000 this is relevant and the branch is updated. Please feel free to approve!

ERROR:composer.cli.launcher:Rank 2 crashed with exit code -7

We have not battle tested MPT-7B finetuning on g5.12xlarge instance (A10) instances, as most of our internal benchmarking was done on A100s. You might find this thread helpful #82; they...

Feat: Script to generate addition eval data

Are you planning on adding a jsonl file with a generated dataset, as well as a yaml to make it part of our ICL suite?

ERROR: expected to be in states [<TrainingState_.IDLE: 1>] but current state is TrainingState_.BACKWARD_PRE

Hi @NarenZen, can you provide more details on your hardware and docker image? Have you successfully saved checkpoints? Are you using multiple nodes?

Enhance `label_smoothing` algorithm

This is a great PR! We are looking this over and will get back to you with more detailed questions/requests soon

config class for bert is not consistent

A quick fix here is to do get the config and then pass it in to `AutoModelForMaskedLM.from_pretrained` ```python import torch import transformers from transformers import AutoModelForMaskedLM, BertTokenizer, pipeline from transformers...

Change bf16 to amp_bf16

Thanks for catching this! You are correct in using the `amp_bf16` flag. We are updating the yamls accordingly

MosaicBERT: pretraining configuration for models > 128 seq. length

Hi @stefan-it we did not experiment with training on 128 then switching to 512 (as in the original BERT paper by Devlin et al. 2018). In our experiments, training MosaicBERT-Base...

MosaicBERT: pretraining configuration for models > 128 seq. length

Hi @mmarius, we did not specifically train MosaicBERT-Large with sequence length 512 with batch size 4096 for 70,000 steps. However my estimate would be roughly 4x the time it takes...

MosaicBERT: pretraining configuration for models > 128 seq. length

If you are going any larger than that I would recommend looking at the [mosaicml/llm-foundry](https://github.com/mosaicml/llm-foundry) which should have support for training encoders/embedding models soon.