TopoStats Add upper limit to thresholding

This PR adds the ability to set both a minimum and maximum when thresholding. See #487.

From a user-perspective this changes the config files from having a single value in the thresholding sections, to having two values like how absolute_area_thresholds have:

  otsu_threshold_multiplier: 1.0
  threshold_std_dev:
    below: [10.0, null] # Minimum and maximum thresholds for data below the image background
    above: [1.0, null] # Minimum and maximum thresholds for data above the image background
  threshold_absolute:
    below: [-1.0, null] # Minimum and maximum thresholds for data below the image background
    above: [1.0, null] # Minimum and maximum thresholds for data above the image background

Note that:

Otsu method still takes a single fudge factor, as the nature of this threshold is to perform binary segmentation.
null is an accepted value when you don't want to set an minimum and or maximum limit to the thresholds.

See here where minicircle.spm has been processed with threshold_std_dev upper as [1.0, 3.5]. So this is masking points that are between 1.0 and 3.5 standard deviations from the mean of the image.

Technical notes: Internally the way thresholding is done has been overhauled.

Tests have been added for previously untested 😱 utils.py's _get_mask and get_mask (now called get_and_combine_directional_masks for specificity)
The "above" and "below" keys in threshold dictionaries are now considered to be always existent / expected.
Skipping directions is now handled by detecting if the threshold in "above" or "below" is None or not. This makes it a little cleaner I think.
utils.py's get_threshold is monolithic and huge, but I tried refactoring it in clever and elegant ways, and the solutions I came up with required some rather less clear code, so I have purposely left the function verbose as to be more accessible / readable. Happy to have pushback on this though!
As you can see, no image related tests have changed, so this does not affect the results of processing unintentionally.
I had to get rid of the lovely lambda statements in validation.py as I do not know how to get lambda statements to be okay with having a None type too (required to allow null values in the config).
The tests are rather verbose now, as to compare dictionaries with np.Infinity in them, we must unpack it and use the np.isposinf() or np.isneginf() methods where infinities are encountered. This is because np.Infinity != np.Infinity.

Sep 14 '23 22:09 SylviaWhittle

I'll update the notebooks & documentation now, I almost forgot!

Sep 14 '23 22:09 SylviaWhittle

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (bbd775e) 83.42% compared to head (33fbd5b) 83.71%. Report is 184 commits behind head on main.

:exclamation: Current head 33fbd5b differs from pull request most recent head d366170. Consider uploading reports for the commit d366170 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #666      +/-   ##
==========================================
+ Coverage   83.42%   83.71%   +0.29%     
==========================================
  Files          21       21              
  Lines        3047     3089      +42     
==========================================
+ Hits         2542     2586      +44     
+ Misses        505      503       -2

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Sep 14 '23 22:09 codecov-commenter

I'd appreciate a preliminary review so that I can update the documentation after we have settled on the right course of action if that's okay 😅

@derollins I'd really like your input too from a user-perspective if you have time 🙏

Sep 14 '23 22:09 SylviaWhittle

Tried this, seems to work although there is a ring around the objects I don't want to mask... which is inevitable and not the fault of the code. 20230413_Cas9_RNA_complex_DNA_MCON_4ng_NiCl2_3mM_HEPES_20mM 0_00027_above_masked

Sep 15 '23 11:09 derollins

Do we think this is an actually useful feature to add, @derollins? It adds complexity and so I'm hesitant to add it unless we are sure it is necessary or needed.

Sep 29 '23 00:09 SylviaWhittle

I'm all for following the KISS principle.

Oct 03 '23 10:10 ns-rse

Tests updated and coverage has been made comprehensive thanks to CodeCov's ability to detect per-line coverage. 🔧

Oct 10 '23 18:10 SylviaWhittle

I think I have addressed all but one of your points @ns-rse. The schema doesn't seem to like using lambda functions in Or sadly 😢

Oct 10 '23 18:10 SylviaWhittle

Note to self: make the config take both single values (counted as the lower threshold) and two values (lower and upper), parsed when reading the config. This is to reduce complexity of the config file and because two thresholds will be an extremely niche feature.

Jan 24 '24 13:01 SylviaWhittle

Closing as this still has not been requested by users and it would be easier to re-make rather than update this old PR. Can easily be ressurected later if needed.

Jun 08 '24 22:06 SylviaWhittle