get_best_level_for_downsample too strict wasting speed performance?
Context
Issue type (bug report or feature request): Feature request Operating system (e.g. Fedora 24, Mac OS 10.11, Windows 10): Win7 Platform (e.g. 64-bit x86, 32-bit ARM): 64-bit OpenSlide version: 3.4.1 Slide format (e.g. SVS, NDPI, MRXS): svs
Details
I encountered several times a non-intuitive and slow behavior of openslide. Consider a large slide with the 3 l_dimensions stored in the file:
level 0: 55776 x 42423 level 1: 13944 x 10605 level 2: 3486 x 2651
In total, the Deepzoom simulates 17 z_dimensions:
55776 x 42423 27888 x 21212 13944 x 10606 6972 x 5303 3486 x 2652 1743 x 1326 .... x ....
Interestingly, openslide's get_best_level_for_downsampling returns following slide_from_dz_level for those z_dimensions:
0 0 0 1 1 2 2 2 ..2
This means:
- for DZ level 55776 x 42423, it samples from file level 0 (55776 x 42423),
- for DZ level 27888 x 21212, it also samples from file level 0 (55776 x 42423) and downscales,
- for DZ level 13944 x 10606, it also samples from file level 0 (55776 x 42423) and downscales,
- for DZ level 6972 x 5303, it samples from file level 1 (13944 x 10605) and downscales,
- for DZ level 3486 x 2652, it also samples from file level 1 (13944 x 10605) and downscales,
- for DZ level 1743 x 1326, it samples from file level 2 ( 3486 x 2651) and downscales,
- then continuing sampling from level 2, since there is nothing smaller in the file.
I understand the algorithm and why it is doing so: the constraint is that it samples from the next larger or equally sized level in the file.
But (3.) seems counter-intuitive:
If we would loosen the constraint and not do strictly downsample, we could sample DZ level 13944 x 10606 from level 1 (13944 x 10605) which reduces the extend of resizing.
Analogously (5.) sampling 3486 x 2652 from file level 2 (3486 x 2651).
Thus, the whole function get_tile(...) could be much faster, since a lot of re-sizing cost vanishes.
Of course, (3.) and (5.) do so since the height is 1 pixel too small to sample from the higher level. But to sample from a much larger next-lower level seems a too large punishment for this one pixel.
This problem repeats in a lot of other slides which do have those rounding issues with odd width or height.
What do you think?
Best, Peter
Actually, it seems to me that deepzoom is picking the wrong dimensions. If you start with 55776 x 42423 and half both dimensions, I'd expect to get 27888 x 21211, or else you have to invent half a pixel worth of data. The next level would then be 13944 x 10605 and match perfectly with the file level 1.
Of course in this case you lost half a pixel data on the first downscale, and effectively 2 pixels worth of data on the next level, etc. until you hit an evenly divisible number.
I guess the loss of pixels is actually the bigger deal. The 13944 x 10605 image has lost 3 pixels from the original, but were they all taken from one side, divided across both sides. Or did the original image get scaled so that it actually has interpolated values of all the original data on the new grid.
For the 13944 x 10606 version only a single pixel had to be added, possibly by doubling the last row or column of the original full resolution image. But because it is 2 levels up from the full resolution image only 25% of that last pixel actually consists of data that was not in the full resolution image.
Continuing this line of reasoning to the higher levels, 3486 x 2651 is 1/16th of a 55776 x 42416 image, so it has lost 7 pixels somewhere in the scaling/rounding, unless again it somehow computes interpolated values. While the corresponding deepzoom level 3486 x 2652 scales to 55776 x 42432 and so it has added 9 pixels to the original image. Those 9 pixels really account for only half of the edge pixel of the final scaled down version. Something could be said for trying to add such extra pixels by splitting this between both sides of the image instead of only one end, which is what I think openslide does now.
Thanks, @jaharkes, for your thoughts.
I tried to feed Deepzoom with your expected dimensions by changing
z_size = tuple(max(1, int(math.ceil(z / 2))) for z in z_size) to
z_size = tuple(max(1, int(math.floor(z / 2))) for z in z_size).
But then, the simulated deepzoom pyramide does not have 17 levels anymore, but only 16. This leads to an index-out-of-bounds exception, since OpenSeadragon still is assuming 17 levels for this image.
This leads to my assumption that Openslide and OpenSeadragon apparently both have the convention to calculate the pyramide with math.ceil, not to loose any pixel information. But the scanner that generated the file might follow another convention, namely to leave the last pixels away if they are not a multiple of the integer level downsample.
Still, it would be nice if Openslide could handle this by tolerating this pixel gap (which anyway disappears when we zoom in, since then downsample will be 1).