Andcircle

Results 30 comments of Andcircle

@HamidShojanazeri thanks for your response cuda 12.2 nccl 2.19.3 torch 2.2.0 transformer 4.37.2 trl 0.7.10 accelerate 0.27.2 bitsandbytes 0.42.0

@HamidShojanazeri cuda 12.1.1 torch 2.2.1 still have exactly the same error at the same step Any hints or guidance, how to debug this type of situation? tried to add TORCH_CUDA_SANITIZER=1,...

@HamidShojanazeri I did add this ENV, but didn't get extra info, I also used sanitizer =) This smallest code snippet I can reproduce ```import os import wandb import torch from...

@HamidShojanazeri thanks for your reply, this is just an demo snippets, we actually use MP + DDP, FSDP without qlora can't save memory that much, since we have relatively long...

@ktlKTL using your package version, got new error: ValueError: Trying to set a tensor of shape torch.Size([32000, 4096]) in "weight" (which has shape torch.Size([0])), this look incorrect. Any hints?

Hey @tmm1, sorry to bother Still facing the same issue: ``` MAX_JOBS=4 pip install -U flash-attn --no-build-isolation Collecting flash-attn Using cached flash_attn-2.1.0.tar.gz (2.2 MB) Preparing metadata (pyproject.toml) ... error error:...

@younesbelkada All the test case above is using device_map="auto", it also works for me. BUT: if I use device_map={'':torch.cuda.current_device()}, the error shows again like: ``` Traceback (most recent call last):...

@younesbelkada Even if set device_map="auto", if only have 1 GPU, still facing the error: ``` Traceback (most recent call last): File "train1.py", line 124, in trainer = SFTTrainer( File "/usr/local/lib/python3.8/dist-packages/trl/trainer/sft_trainer.py",...