Christopher Chou
Christopher Chou
To fix the issue with the "found optimizer but no scheduler", simply remove the optimizer from the deepspeed config. This was a new change with the new version of transformers...
My deepspeed config and training script is the same as listed [here](https://github.com/lm-sys/FastChat/blob/main/docs/training.md)
Yes, reloading from the new checkpoint still works, and the checkpoint size for the `adapter_model` was 17M for me. I was able to test `apply_lora` and it works with the...
I discovered and tested that the program (following [this thread](https://discuss.huggingface.co/t/trainer-option-to-disable-saving-deepspeed-checkpoints/13262/4)) will not create the `pytorch_model.bin` file if we set `"stage3_gather_16bit_weights_on_model_save": false` in `ds_config.json` and the checkpoint resumes. So, now the...
I tested resuming from checkpoint while deleting all the files `zero_pp_rank_x_mp_rank_00_model_states.pt` and I run into an AssertionError when loading it from the checkpoint presumably since it's looking for those files.
I was working on this PR, but the best I can do so far is to be able to load from an adapter but we lose the LR schedule, optimizer...
I heard that people also get good results fine-tuning "fc1" and "fc2" modules from [this paper](https://arxiv.org/pdf/2110.04366.pdf): > "we conclude that modifying head attention shows the best results when the parameter...
Yes, I believe it does add additional parameters.
I think you should use `apply_lora.py` to merge the adapter `project-baize/baize-lora-7b` to the base model `llama-7b`.
Closing because it is resolved through #112