diffusers [Community Pipelines] Accelerate inference of AnimateDiff by IPEX on CPU

Hi, this pipeline aims to speed up the inference of AnimateDiff on Intel Xeon CPUs on Linux. It is much alike the previous one I proposed and merged #6683 for SDXL.

By using this optimized pipeline, we can get about 1.5-2.2 times performance acceleration with BFloat16 on fifth generation of Intel Xeon CPUs, code-named [Emerald Rapids]. It is also recommended to run on Pytorch/IPEX v2.0 and above to get the best performance boost. The main profits are illustrated as below, which are the same with our previous PR:

For Pytorch/IPEX v2.0 and above, it benefits from MHA optimization with Flash Attention and TorchScript mode optimization in IPEX.
For Pytorch/IPEX v1.13, it benefits from TorchScript mode optimization in IPEX.

Below are the tables which show the test results for AnimateDiff-Lightning (a model that is a distilled version of AnimateDiff SD1.5 v2, a lightning-fast text-to-video generation model which uses AnimateDiff pipeline) with 1/2/4/8 steps on Intel® Xeon® Platinum 8582C Processor (60cores/socket, 1socket) w/ data type BF16: Could u pls help to review? Thanks!

Jun 20 '24 02:06 ustcuna

Hi @patrickvonplaten and @pcuenca, could you pls help to review this PR? since this one uses almost the same optimization methods and is with almost the same code format like the previous one I proposed and merged for SDXL pipeline#6683 . Thanks a lot!

Jun 20 '24 02:06 ustcuna

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Jul 10 '24 09:07 HuggingFaceDocBuilderDev

Hi @a-r-r-o-w , thank u so much for the detailed reviewing! So thoughtful and professional that helps improve the code with higher quality.🙂 With the new commit, I almost address all the suggestions you commented, could u pls help to review again? Except for the adapter.from_pretrained one I did not change, since here I was intending to totally keep the same example with the AnimateDiff-Lightning Diffusers Usage ByteDance/AnimateDiff-Lightning. Do we need to change it here to keep consistency? If so, I will change it to adapter.from_pretrained with a new commit. Again, thanks sincerely!

Jul 11 '24 03:07 ustcuna

Thank you applying the review comments and the kind words! The adapter.from_pretrained is a personal style choice and not a problem at all for community pipelines, and so we don't need to change it. The tests also have all passed, which is great and we can probably merge, but one last thing that I wanted to ask was why are we not supporting cross_attention_kwargs and added_cond_kwargs here? Will that require preparing the IP Adapter models, for example, if loaded?

Jul 11 '24 04:07 a-r-r-o-w

Thank you applying the review comments and the kind words! The adapter.from_pretrained is a personal style choice and not a problem at all for community pipelines, and so we don't need to change it. The tests also have all passed, which is great and we can probably merge, but one last thing that I wanted to ask was why are we not supporting cross_attention_kwargs and added_cond_kwargs here? Will that require preparing the IP Adapter models, for example, if loaded?

Hi @a-r-r-o-w , the reason why we did not support cross_attention_kwargs and added_cond_kwargs here is that when leveraging torch.jit.trace to optimize unet, NoneType can not be inputs to traced functions. Even if we can feed fake inputs for cross_attention_kwargs and added_cond_kwargs to trace unet and succeed, we still need to add a fake input for cross_attention_kwargs and added_cond_kwargs inside call function when they are default None, otherwise errors will occur due to incompatible function arguments. But I do think that would be confused for users to understand... Thus we decide not to support cross_attention_kwargs and added_cond_kwargs here. Do you have any better suggestions to address this? Looking forward to your opinion! Thx!

Jul 12 '24 08:07 ustcuna

I see, that is insightful. I do not have a deep understanding of torch.jit.trace (or torchscript in general) due to limited experience, but I've encountered NoneType issues in the past, so it is understandable why it might be hard to support those arguments. Optional type hinting could probably help (?) but it will require many diffusers core code changes I believe, which is out of scope, or fake inputs like you mention but it's okay to not do it here as this is an example pipeline.

Thanks for your contribution!

Jul 12 '24 09:07 a-r-r-o-w

I see, that is insightful. I do not have a deep understanding of torch.jit.trace (or torchscript in general) due to limited experience, but I've encountered NoneType issues in the past, so it is understandable why it might be hard to support those arguments. Optional type hinting could probably help (?) but it will require many diffusers core code changes I believe, which is out of scope, or fake inputs like you mention but it's okay to not do it here as this is an example pipeline.

Thanks for your contribution!

Thanks a lot for your understanding of this situation, which is quit a headache!😀 I would like to thank u again for reviewing and merging the code！

Jul 12 '24 09:07 ustcuna