diffusers Begin simplifying CrossAttention so that it works better on the Apple Neural Engine

Hi folks,

I converted this CrossAttention portion with coremltools, and it does in fact remove about 4 reshape operation and a few transposes, getting down to, 4 transposes and 4 reshapes left.

Unfortunately, it seems that is still too many to compile on the ANE.

Any ideas about what else I could do to simplify this? I took a stab at using another einsum for the attn and value matmul, but I don't think I was doing it correctly.

Sep 30 '22 17:09 MatthewWaller

cc: @patrickvonplaten @pcuenca

Sep 30 '22 17:09 MatthewWaller

The documentation is not available anymore as the PR was closed or merged.

Sep 30 '22 17:09 HuggingFaceDocBuilderDev

Yeah, this is going to take more investigation. More experimenting has revealed that this may not be the exact pain point for ANE.

I know that einsum can cause problems for certain types. Only two versions were natively supported by coremltools for instance. This one is one of the ones that should work no problem.

But since I haven't been able to fully diagnose where the hangup is, I'll put this PR on ice.

Oct 13 '22 20:10 MatthewWaller