ImageSharp Proposal: Value Type Kernels

Prerequisites

[x] I have written a descriptive issue title
[x] I have verified that I am running the latest version of ImageSharp
[x] I have searched open and closed issues to ensure it has not already been reported

The problem

There are many places in the library where we operate on relatively small matrices known at compile time using a dynamically allocated heap array (usually wrapped in Fast2DArray):
- ConvolutionProcessor-s
- PatternBrush-es
- ResizeProcessor weight windows
This approach is easy to code, but has implicit overheads with a significant negative effect on performance:
- Indexing heap arrays increases the probabilty of cache misses
- The JIT is not able to apply optimized register allocation with dynamic arrays. The data is always accessed by "read from memory" (mov ?) CPU instructions.
- Calculations are iterating using nested for loops instead of having a simple expression
- All in all: the generated machine code is more complex, longer, slower to execute than it should be

The solution

Define KernelMatrix{N}x{M} structures for common kernel sizes (Like KernelMatrix9x3, KernelMatrix4x1 etc.)
Use specific kernel matrix structures (known at compile time!) as a generic arguments in implementation related classes (ConvolutionProcessor<TColor, TKernelMatrix> !)
- Using KernelMatrix9x1-like constructs as resampler windows could bring significant performance improvements for ResizeProcessor (see #139)
Implement all kernel-related calculations as simple expressions without for loops, etc (Like implementations of Matrix4x4 methods)
Use a general matrix for unknown sizes (something similar toFast2DArray<float>)

TODO

[ ] Implement a prototype/emulation benchmark comparing 2 Convolution implementations: heap matrix VS value matrix
[ ] Design and implement stuff based on conclusions

Remarks

The concept is very similar to the idea behind Block8x8F in Jpeg. Utilizing information that is known at compile time is a very efficient technique to improve performance.

Mar 18 '17 21:03 antonfirsov

@antonfirsov I think we should try and make this a priority now if we're gonna get it in for V1.

Oct 08 '17 01:10 JimBobSquarePants

@JimBobSquarePants hmm .. originally I wanted to polish + standardize our memory management in the next few months, but there are many uncertainties around that topic, so maybe I'll grab this one instead.

There is a lot of work here though, but maybe we can make it into 1.0.

Oct 08 '17 18:10 antonfirsov

@antonfirsov Have a look and estimate it then. Looking at it myself I'm envisaging quite a lot of API changes to allow it which makes me think maybe it's for V2 instead.

Oct 08 '17 21:10 JimBobSquarePants