wangyanhui666 comments

Results 6 comments of


                                            wangyanhui666

Using mosaicml streaming with accelerate ?

i skip this wrapping step. If wrapping the dataloader, will cause error. https://github.com/mosaicml/streaming/issues/789#issuecomment-2405432617

Guide on using with HuggingFace accelerate/Trainer

In my code, if I use accelerate to wrap the dataloader again, it will cause a deadlock. I think this is because the streaming dataset is already split for each...

Guide on using with HuggingFace accelerate/Trainer

reference:https://huggingface.co/docs/accelerate/package_reference/torch_wrappers

Guide on using with HuggingFace accelerate/Trainer

> @wangyanhui666 so is training successful if you don't wrap the dataloader, as mentioned in some previous issues? yes, training successful. I use 1node 4gpus to train. not test multi...

torch 2.5 CuDNN backend for SDPA NaN error

I think this is a bug of pytorch. they are working on this to fix this bug. https://github.com/pytorch/pytorch/pull/138354#issue-2598184802 Before they fix we should use pytorch

torch 2.5 CuDNN backend for SDPA NaN error

i tried pytorch 2.4.1 also have this bug. so maybe disable the CuDNN attention backend in training code is a good solution.