performance improvements for `deconv_rl.py`

Open VolkerH opened this issue 6 years ago • 1 comments

Hi Martin,

I was looking at the FFT-based implementation of RL-deconvolution in deconv_rl.py and noticed a few things.

the fft plan is pre-calculated but not actually passed to the fft functions, resulting in some overhead
for deconvolutions that will be performed repeatedly with the same PSF on the same size of data a lot of code will be run twice.
there is a lot of code duplication between the two functions that take np arrays and the ones that take openCL arrays.
I do not understand why this hflip = h[::-1, ::-1] is needed. I'm also not sure whether it is correct, I assume for the 3D case this would have to be hflip = h[::-1, ::-1, ::-1]. Maybe you can explain.

To address the first two points I have rewritten your code to test this. The rewritten code is here: https://github.com/VolkerH/Lattice_Lightsheet_Deskew_Deconv/blob/benchmarking/lls_dd/deconv_gputools_rewrite.py I wasn't sure whether and if so how you would like to integrate this approach of setting up the decon first in gputools, otherwise I would have edited it there and created a pull request.

I have done some benchmarks comparing the rewritten code to the current implementation in gputools and to flowdec: https://github.com/VolkerH/Lattice_Lightsheet_Deskew_Deconv/issues/21. Note that the iteration times are not purely deconvolution but also include IO and affine transforms. This adds plenty of overhead. Without this overhead the speed improvements are even more significant.

Mar 15 '19 13:03 VolkerH

Just to add, this is not urgent. I think I can figure out some of this myself, I just wanted to keep you in the loop.

Mar 15 '19 13:03 VolkerH