fsdp_qlora icon indicating copy to clipboard operation
fsdp_qlora copied to clipboard

bugs for fine-tune fsdp multinode

Open batman-do opened this issue 2 years ago • 1 comments

image how to fix that

batman-do avatar Mar 11 '24 07:03 batman-do

Can you share the training command you used with full arguments, and also provide versions of the following libraries:

accelerate                
bitsandbytes            
datasets                  
hqq                       
hqq-aten              
huggingface-hub 
llama-recipes       
peft                      
safetensors         
tokenizers           
torch                    
transformers       

You are likely using an older version of bitsandbytes, quant_storage arg was introduced here: https://github.com/TimDettmers/bitsandbytes/commit/dcfb6f81433e37a8546f7dab3f648eaf858b29ff.

Try pip install -U bitsandbytes and retry. Also for multi-node training make sure each node has the up-to-date bnb version, ideally using same environment across all.

KeremTurgutlu avatar Mar 12 '24 12:03 KeremTurgutlu