Anyi Li

Results 5 comments of Anyi Li

BTW, do you have ETA, we are using Redis as our primary DB at this moment

If I setup resume_from_checkpoint=saved_model/checkpoint-1000, it will throw exception like `ValueError: DistributedDataParallel device_ids and output_device arguments only work with single-device/multiple-device GPU modules or CPU modules, but got device_ids [0], output_device 0,...

@KKcorps that works for me

For now. I used the exact same qlora.py and the same training parameter from the ./script folder. Trained on 7b. I still got 319977229 as the size of adapter_model.bin. While...

I found the reason. The model has been loaded as bf16, and the adapter weights will also be saved as bf16. However, the script given by the author used bf16,...