Max Idahl comments

Results 7 comments of


                                            Max Idahl

RuntimeError: Error(s) in loading state_dict for MistralForCausalLM (Deepspeed Zero 3)

> I can confirm that I only experience this issue when using Zero3, and Zero 2 works fine. I just ran into the same error, can confirm switching from zero3...

RuntimeError: Error(s) in loading state_dict for MistralForCausalLM (Deepspeed Zero 3)

> > I can confirm the same error when finetuning Mistral with chatml format and deepspeed3. > > ``` > > loading model > > Traceback (most recent call last):...

Vicuna 13B forward method is very slow in FSDP mode.

Here is a working example you can try: ```python from functools import partial import torch from torch.distributed.fsdp import FullyShardedDataParallel as FSDP from torch.distributed.fsdp.wrap import transformer_auto_wrap_policy from transformers import LlamaTokenizer, LlamaForCausalLM...

Model Parallelism and accelerate's usage of DDP aren't compatible

Just to document my experience on getting DDP + MP (2x2 on 4 gpus) to work with Accelerate (via HF trainer): I modified the current main branch to initialize the...

Model Parallelism and accelerate's usage of DDP aren't compatible

> @maxidl can you share your modified code? Curious what those exceptions are that exist for "no good reason" @muellerzr I do think these error are necessary if one does...

Model Parallelism and accelerate's usage of DDP aren't compatible

Sure, that sounds great. Once the changes are in (no rush with that), I might create a tutorial-style GitHub repo for it and do some benchmarking, to be shared via...

Fastwarc reader

Including fastwarc would be nice. However, in the current text extraction pipeline for fineweb, the warc reader is not a bottleneck (