Qingyang Wu

Results 3 issues of Qingyang Wu

https://github.com/microsoft/DialoGPT/blob/b85558dea5391f83b20120d6c93b9f79fcc72311/reddit_extractor/src/reddit.py#L108-L112

This line does not save optimizer state correctly when using FSDP. https://github.com/huggingface/transformers/blob/88399476c3892435395618ed37993176dbb0de73/src/transformers/trainer.py#L2383 It should use FSDP's full_optim_state_dict to collect optimizer states from different processes. ```python FSDP.full_optim_state_dict(self.model, self.optimizer) ```

### ⚠️ Please check that this feature request hasn't been suggested before. - [x] I searched previous [Ideas in Discussions](https://github.com/axolotl-ai-cloud/axolotl/discussions/categories/ideas) didn't find any similar feature requests. - [x] I searched...

enhancement
help wanted