ImageSharp icon indicating copy to clipboard operation
ImageSharp copied to clipboard

Proposal: Value Type Kernels

Open antonfirsov opened this issue 8 years ago • 3 comments

Prerequisites

  • [x] I have written a descriptive issue title
  • [x] I have verified that I am running the latest version of ImageSharp
  • [x] I have searched open and closed issues to ensure it has not already been reported

The problem

  • There are many places in the library where we operate on relatively small matrices known at compile time using a dynamically allocated heap array (usually wrapped in Fast2DArray):
    • ConvolutionProcessor-s
    • PatternBrush-es
    • ResizeProcessor weight windows
  • This approach is easy to code, but has implicit overheads with a significant negative effect on performance:
    • Indexing heap arrays increases the probabilty of cache misses
    • The JIT is not able to apply optimized register allocation with dynamic arrays. The data is always accessed by "read from memory" (mov ?) CPU instructions.
    • Calculations are iterating using nested for loops instead of having a simple expression
    • All in all: the generated machine code is more complex, longer, slower to execute than it should be

The solution

  • Define KernelMatrix{N}x{M} structures for common kernel sizes (Like KernelMatrix9x3, KernelMatrix4x1 etc.)
  • Use specific kernel matrix structures (known at compile time!) as a generic arguments in implementation related classes (ConvolutionProcessor<TColor, TKernelMatrix> !)
    • Using KernelMatrix9x1-like constructs as resampler windows could bring significant performance improvements for ResizeProcessor (see #139)
  • Implement all kernel-related calculations as simple expressions without for loops, etc (Like implementations of Matrix4x4 methods)
  • Use a general matrix for unknown sizes (something similar toFast2DArray<float>)

TODO

  • [ ] Implement a prototype/emulation benchmark comparing 2 Convolution implementations: heap matrix VS value matrix
  • [ ] Design and implement stuff based on conclusions

Remarks

The concept is very similar to the idea behind Block8x8F in Jpeg. Utilizing information that is known at compile time is a very efficient technique to improve performance.

antonfirsov avatar Mar 18 '17 21:03 antonfirsov

@antonfirsov I think we should try and make this a priority now if we're gonna get it in for V1.

JimBobSquarePants avatar Oct 08 '17 01:10 JimBobSquarePants

@JimBobSquarePants hmm .. originally I wanted to polish + standardize our memory management in the next few months, but there are many uncertainties around that topic, so maybe I'll grab this one instead.

There is a lot of work here though, but maybe we can make it into 1.0.

antonfirsov avatar Oct 08 '17 18:10 antonfirsov

@antonfirsov Have a look and estimate it then. Looking at it myself I'm envisaging quite a lot of API changes to allow it which makes me think maybe it's for V2 instead.

JimBobSquarePants avatar Oct 08 '17 21:10 JimBobSquarePants