Max Idahl
Max Idahl
> I can confirm that I only experience this issue when using Zero3, and Zero 2 works fine. I just ran into the same error, can confirm switching from zero3...
> > I can confirm the same error when finetuning Mistral with chatml format and deepspeed3. > > ``` > > loading model > > Traceback (most recent call last):...
Here is a working example you can try: ```python from functools import partial import torch from torch.distributed.fsdp import FullyShardedDataParallel as FSDP from torch.distributed.fsdp.wrap import transformer_auto_wrap_policy from transformers import LlamaTokenizer, LlamaForCausalLM...
Just to document my experience on getting DDP + MP (2x2 on 4 gpus) to work with Accelerate (via HF trainer): I modified the current main branch to initialize the...
> @maxidl can you share your modified code? Curious what those exceptions are that exist for "no good reason" @muellerzr I do think these error are necessary if one does...
Sure, that sounds great. Once the changes are in (no rush with that), I might create a tutorial-style GitHub repo for it and do some benchmarking, to be shared via...
Including fastwarc would be nice. However, in the current text extraction pipeline for fineweb, the warc reader is not a bottleneck (