TopoStats Sylvia whittle/height trace redo

Adds the option of saving grain trace heights & grain trace cumulative distances to JSON file.

Trace heights mean the pixel heights of every coordinate in the ordered trace. It is an array / list.

Trace cumulative distances are the distances for each pixel as you traverse the ordered trace. Distances only go up by 1.0 or sqrt(2) since pixels are either adjacent or diagonal. Hope that makes sense.

They are saved in files like this:

The structure of the JSON is a dictionary where the keys are the molecule number and the values are the lists of data (heights or cumulative distances).

[Technical notes]

Internally I kept the arrays as lists, since JSON cannot serialise numpy arrays, and I thought it would be preferable to avoid adding a script to deconstruct the dictionary, then convert the arrays to lists and then re-constructing the dictionary again. Let me know if this is silly 😋

I have kept the saving in processing.py as I plan for run_topostats to be generic and not handle extra output from workflows in the future. Feel free to disagree with me on this, but I think maybe we should let workflows save data as they see fit during processing of individual images without them having to return it to run_topostats since I foresee workflows saving a lot of random stuff that we don't necessarily want to write extra code to handle in run_topostats?

Oct 12 '23 16:10 SylviaWhittle

Codecov Report

Attention: 13 lines in your changes are missing coverage. Please review.

Comparison is base (8bfa700) 84.36% compared to head (9763a47) 84.14%. Report is 5 commits behind head on main.

:exclamation: Current head 9763a47 differs from pull request most recent head e257349. Consider uploading reports for the commit e257349 to get more accurate results

Files	Patch %	Lines
topostats/processing.py	13.33%	13 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #695      +/-   ##
==========================================
- Coverage   84.36%   84.14%   -0.23%     
==========================================
  Files          21       21              
  Lines        3134     3172      +38     
==========================================
+ Hits         2644     2669      +25     
- Misses        490      503      +13

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Oct 12 '23 16:10 codecov[bot]

Adding a test for the .json output is blocked by #653

Oct 16 '23 10:10 SylviaWhittle

Not looked at the code yet, and this may be part of the JSON output but there's no test to see what it looks like (sorry for banging the same :long_drum: again) but I'm wondering what the utility of knowing the heights without the locations is?

Sure summary statistics could be calculated, but if anyone wanted to plot or visualise the data elsewhere the location of the points would be required.

Oct 17 '23 19:10 ns-rse

I have kept the saving in processing.py as I plan for run_topostats to be generic and not handle extra output from workflows in the future. Feel free to disagree with me on this, but I think maybe we should let workflows save data as they see fit during processing of individual images without them having to return it to run_topostats since I foresee workflows saving a lot of random stuff that we don't necessarily want to write extra code to handle in run_topostats?

I agree with this, each module of analysis should save whatever it should/is configured to do at each stage.

Oct 25 '23 08:10 ns-rse

No longer blocked by #653

Nov 02 '23 09:11 SylviaWhittle

There has been some debate about what format the files should be saved in, and if all traces should be in one file or multiple etc, and so I'm thinking before trying to refine this further, maybe push to main and let people use it for a while to see what issues come up?

Happy to change how this works if people feel strongly on it though. Just this has been delayed due to indecision and also me not being available enough 😅

I do still need to write a test for saving the JSON as is flagged by codecov. Would you suggest breaking this out into a function and having that tested individually instead of having the messy logic loose inside run_dnatracing()?

Jan 24 '24 10:01 SylviaWhittle

:+1: Yes, definitely isolate the code into its own testable function. Test with small examples rather than full data set if possible too (even smaller than the cropped picture is likely possible, see examples of tests for write_yaml() in tests/test_io.py::test_write_config](https://github.com/AFM-SPM/TopoStats/blob/43410ab06bf366764271553a16f88d97f3c29999/tests/test_io.py#L116) which uses a dummy [CONFIG`).

Jan 24 '24 11:01 ns-rse

Note to self: Make the heights and distances all in one JSON and change molecule index to a string "grain_54".

Jan 24 '24 13:01 SylviaWhittle