Proposal: Value Type Kernels
Prerequisites
- [x] I have written a descriptive issue title
- [x] I have verified that I am running the latest version of ImageSharp
- [x] I have searched open and closed issues to ensure it has not already been reported
The problem
- There are many places in the library where we operate on relatively small matrices known at compile time using a dynamically allocated heap array (usually wrapped in
Fast2DArray):-
ConvolutionProcessor-s -
PatternBrush-es -
ResizeProcessorweight windows
-
- This approach is easy to code, but has implicit overheads with a significant negative effect on performance:
- Indexing heap arrays increases the probabilty of cache misses
- The JIT is not able to apply optimized register allocation with dynamic arrays. The data is always accessed by "read from memory" (
mov?) CPU instructions. - Calculations are iterating using nested for loops instead of having a simple expression
- All in all: the generated machine code is more complex, longer, slower to execute than it should be
The solution
- Define
KernelMatrix{N}x{M}structures for common kernel sizes (LikeKernelMatrix9x3,KernelMatrix4x1etc.) - Use specific kernel matrix structures (known at compile time!) as a generic arguments in implementation related classes (
ConvolutionProcessor<TColor, TKernelMatrix>!)- Using
KernelMatrix9x1-like constructs as resampler windows could bring significant performance improvements for ResizeProcessor (see #139)
- Using
- Implement all kernel-related calculations as simple expressions without for loops, etc (Like implementations of
Matrix4x4methods) - Use a general matrix for unknown sizes (something similar to
Fast2DArray<float>)
TODO
- [ ] Implement a prototype/emulation benchmark comparing 2 Convolution implementations: heap matrix VS value matrix
- [ ] Design and implement stuff based on conclusions
Remarks
The concept is very similar to the idea behind Block8x8F in Jpeg. Utilizing information that is known at compile time is a very efficient technique to improve performance.
@antonfirsov I think we should try and make this a priority now if we're gonna get it in for V1.
@JimBobSquarePants hmm .. originally I wanted to polish + standardize our memory management in the next few months, but there are many uncertainties around that topic, so maybe I'll grab this one instead.
There is a lot of work here though, but maybe we can make it into 1.0.
@antonfirsov Have a look and estimate it then. Looking at it myself I'm envisaging quite a lot of API changes to allow it which makes me think maybe it's for V2 instead.