Add upper limit to thresholding
This PR adds the ability to set both a minimum and maximum when thresholding. See #487.
From a user-perspective this changes the config files from having a single value in the thresholding sections, to having two values like how absolute_area_thresholds have:
otsu_threshold_multiplier: 1.0
threshold_std_dev:
below: [10.0, null] # Minimum and maximum thresholds for data below the image background
above: [1.0, null] # Minimum and maximum thresholds for data above the image background
threshold_absolute:
below: [-1.0, null] # Minimum and maximum thresholds for data below the image background
above: [1.0, null] # Minimum and maximum thresholds for data above the image background
Note that:
- Otsu method still takes a single fudge factor, as the nature of this threshold is to perform binary segmentation.
-
nullis an accepted value when you don't want to set an minimum and or maximum limit to the thresholds.
See here where minicircle.spm has been processed with threshold_std_dev upper as [1.0, 3.5]. So this is masking points that are between 1.0 and 3.5 standard deviations from the mean of the image.
Technical notes: Internally the way thresholding is done has been overhauled.
- Tests have been added for previously untested 😱
utils.py's_get_maskandget_mask(now calledget_and_combine_directional_masksfor specificity) - The
"above"and"below"keys in threshold dictionaries are now considered to be always existent / expected. - Skipping directions is now handled by detecting if the threshold in
"above"or"below"isNoneor not. This makes it a little cleaner I think. -
utils.py'sget_thresholdis monolithic and huge, but I tried refactoring it in clever and elegant ways, and the solutions I came up with required some rather less clear code, so I have purposely left the function verbose as to be more accessible / readable. Happy to have pushback on this though! - As you can see, no image related tests have changed, so this does not affect the results of processing unintentionally.
- I had to get rid of the lovely lambda statements in
validation.pyas I do not know how to get lambda statements to be okay with having aNonetype too (required to allownullvalues in the config). - The tests are rather verbose now, as to compare dictionaries with
np.Infinityin them, we must unpack it and use thenp.isposinf()ornp.isneginf()methods where infinities are encountered. This is becausenp.Infinity != np.Infinity.
I'll update the notebooks & documentation now, I almost forgot!
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
bbd775e) 83.42% compared to head (33fbd5b) 83.71%. Report is 184 commits behind head on main.
:exclamation: Current head 33fbd5b differs from pull request most recent head d366170. Consider uploading reports for the commit d366170 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## main #666 +/- ##
==========================================
+ Coverage 83.42% 83.71% +0.29%
==========================================
Files 21 21
Lines 3047 3089 +42
==========================================
+ Hits 2542 2586 +44
+ Misses 505 503 -2
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I'd appreciate a preliminary review so that I can update the documentation after we have settled on the right course of action if that's okay 😅
@derollins I'd really like your input too from a user-perspective if you have time 🙏
Tried this, seems to work although there is a ring around the objects I don't want to mask... which is inevitable and not the fault of the code.
Do we think this is an actually useful feature to add, @derollins? It adds complexity and so I'm hesitant to add it unless we are sure it is necessary or needed.
I'm all for following the KISS principle.
Tests updated and coverage has been made comprehensive thanks to CodeCov's ability to detect per-line coverage. 🔧
I think I have addressed all but one of your points @ns-rse. The schema doesn't seem to like using lambda functions in Or sadly 😢
Note to self: make the config take both single values (counted as the lower threshold) and two values (lower and upper), parsed when reading the config. This is to reduce complexity of the config file and because two thresholds will be an extremely niche feature.
Closing as this still has not been requested by users and it would be easier to re-make rather than update this old PR. Can easily be ressurected later if needed.