spatialdata-plot icon indicating copy to clipboard operation
spatialdata-plot copied to clipboard

Worse performance using datashader?

Open LucaMarconato opened this issue 1 year ago • 6 comments

I wrote some benchmarks available here https://github.com/scverse/spatialdata-plot/pull/295 (they can simply run as tests) and I have noticed that the datashader performance is worse than the matplotlib based one.

I think this maybe be due to the size of the canvas used by datashader since in the MERFISH example here https://github.com/scverse/spatialdata-plot/pull/243 the performance was (as expected) better.

Therefore using a smaller default canvas size may fixed the issue. @Sonja-Stockhaus could you please have a look into this?

LucaMarconato avatar Jul 13 '24 15:07 LucaMarconato

Here are the results of a (single) run of the tests (the timing are consistent across multiple manual runs).

image

LucaMarconato avatar Jul 13 '24 15:07 LucaMarconato

With the fix that I proposed to the performance bug here https://github.com/scverse/spatialdata-plot/issues/297 the performance gap is much bigger

image

LucaMarconato avatar Jul 13 '24 15:07 LucaMarconato

@Sonja-Stockhaus my "didn't-look-at-the-code" theory is that datashader generates too large of an image which then bypasses the rasterisation-downsampling logic. Wdyt?

timtreis avatar Jul 14 '24 16:07 timtreis

Yep, datashader generates an image that is exactly the size of the extent (large extent = large image = long runtime). I'll think of sth so that we can use a smaller canvas size and then maybe rasterize or so to bring it back to the original scale. Do we want a heuristic again to decide on the "smaller canvas size"?

I also noticed that for datashader, e.g. the radius of the points is relative to the axes which is not the case for matplotlib. So for a large extent you need extremely large point sizes to even make them visible at all with datashader. That should be consistent with matplotlib.

Sonja-Stockhaus avatar Jul 15 '24 19:07 Sonja-Stockhaus

Thanks for the explanation. I would reuse the logic of _rasterize_if_necessary() or _multiscale_to_spatial_image() to take the dpi of the figure and the fig_size into consideration, since the extent could be extremely large, but in the end we are limited by the pixels available on screen/paper for plotting.

LucaMarconato avatar Jul 16 '24 14:07 LucaMarconato

Btw, off-topic comment, when plotting Visium HD data as points/circles I noticed a Moire pattern due to the presence of a small rotation in the raw data. With datashader rasterization the Moire pattern disappears, which is great! So using datashader could have also this nice use case beyond improved performance.

LucaMarconato avatar Jul 16 '24 14:07 LucaMarconato