ImageSharp
ImageSharp copied to clipboard
Pre-duplicate kernel values in ResizeKernelMap for faster FMA convolution
As @saucecontrol pointed out in his comment, we can get rid of VPERMS in the following code:
https://github.com/SixLabors/ImageSharp/blob/e2211c316daab3ae59eb85fbc189288849eb54d2/src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeKernel.cs#L104-L112
If FMA is detected we should allocate 4x buffer and to the duplication in ResizeKernelMap.Calculate, which should be much cheaper than doing it in every convolution:
https://github.com/SixLabors/ImageSharp/blob/e2211c316daab3ae59eb85fbc189288849eb54d2/src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeKernelMap.cs#L115-L120