diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Begin simplifying CrossAttention so that it works better on the Apple Neural Engine

Open MatthewWaller opened this issue 3 years ago • 2 comments

Hi folks,

This is to address this issue.

I converted this CrossAttention portion with coremltools, and it does in fact remove about 4 reshape operation and a few transposes, getting down to, 4 transposes and 4 reshapes left.

Unfortunately, it seems that is still too many to compile on the ANE.

Any ideas about what else I could do to simplify this? I took a stab at using another einsum for the attn and value matmul, but I don't think I was doing it correctly.

MatthewWaller avatar Sep 30 '22 17:09 MatthewWaller

cc: @patrickvonplaten @pcuenca

MatthewWaller avatar Sep 30 '22 17:09 MatthewWaller

The documentation is not available anymore as the PR was closed or merged.

Yeah, this is going to take more investigation. More experimenting has revealed that this may not be the exact pain point for ANE.

I know that einsum can cause problems for certain types. Only two versions were natively supported by coremltools for instance. This one is one of the ones that should work no problem.

But since I haven't been able to fully diagnose where the hangup is, I'll put this PR on ice.

MatthewWaller avatar Oct 13 '22 20:10 MatthewWaller