Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed.
What does this PR do?
Added support for SDXL finetune on AscendNPU and fixed the bug causing the hang out when saving models using the deepspeed distributed framework. DeepSpeed requires saving weights on every device; saving weights only on the main process would cause issues.
Fixes # (issue)
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you read the contributor guideline?
- [ ] Did you read our philosophy doc (important for complex PRs)?
- [ ] Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [x] Did you write any new necessary tests?
I fine-tuned SDXL on AscendNPU, and the results are good. I hope diffusers can support more devices.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
- Training examples: @sayakpaul
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
I found some errors in checks, how can I fix it?
examples/controlnet/train_controlnet_sdxl.py:16:1: I001 [*] Import block is un-sorted or un-formatted
examples/text_to_image/train_text_to_image_lora_sdxl.py:18:1: I001 [*] Import block is un-sorted or un-formatted
src/diffusers/models/activations.py:16:1: I001 [*] Import block is un-sorted or un-formatted
src/diffusers/models/attention_processor.py:14:1: I001 [*] Import block is un-sorted or un-formatted
It's strange because I didn't modify the code here.
You can do the following:
- Create a fresh Python environment.
- Run
pip install -e ".[quality]"from the root ofdiffusers. - Run
make style && make quality.
I've already fixed the code formatting issues in the checks.
@sayakpaul I'm ok with this PR if you think it is needed :)
Thanks, Yiyi.
I am alright with the PR because the number of changes is extremely minimal.
I've separated the NPU flash attention into a module and implemented a switch control using parameters.
I've tested it and it works.
@sayakpaul
Thanks for working on this.
For me, the following would be nice to add before we merge:
- Documentation -- add an entry about the NPU processor to https://huggingface.co/docs/diffusers/main/en/api/attnprocessor
- Test: Similar to https://github.com/huggingface/diffusers/blob/26a7851e1e0b18da746d6ae80bb105050f7187e0/tests/models/test_modeling_common.py#L307
@yiyixuxu could you review the changes introduced the core modules of the library and comment?
Sure, I'll add unit tests and documentation later.
I've updated the code. @sayakpaul
Hi @sayakpaul. I noticed the PR is still open. Does the code still need review from others?
Yes, it needs reviews from our core maintainer @yiyixuxu/