Sylvia whittle/height trace redo
Adds the option of saving grain trace heights & grain trace cumulative distances to JSON file.
Trace heights mean the pixel heights of every coordinate in the ordered trace. It is an array / list.
Trace cumulative distances are the distances for each pixel as you traverse the ordered trace. Distances only go up by 1.0 or sqrt(2) since pixels are either adjacent or diagonal. Hope that makes sense.
They are saved in files like this:
The structure of the JSON is a dictionary where the keys are the molecule number and the values are the lists of data (heights or cumulative distances).
[Technical notes]
Internally I kept the arrays as lists, since JSON cannot serialise numpy arrays, and I thought it would be preferable to avoid adding a script to deconstruct the dictionary, then convert the arrays to lists and then re-constructing the dictionary again. Let me know if this is silly 😋
I have kept the saving in processing.py as I plan for run_topostats to be generic and not handle extra output from workflows in the future. Feel free to disagree with me on this, but I think maybe we should let workflows save data as they see fit during processing of individual images without them having to return it to run_topostats since I foresee workflows saving a lot of random stuff that we don't necessarily want to write extra code to handle in run_topostats?
Codecov Report
Attention: 13 lines in your changes are missing coverage. Please review.
Comparison is base (
8bfa700) 84.36% compared to head (9763a47) 84.14%. Report is 5 commits behind head on main.
:exclamation: Current head 9763a47 differs from pull request most recent head e257349. Consider uploading reports for the commit e257349 to get more accurate results
| Files | Patch % | Lines |
|---|---|---|
| topostats/processing.py | 13.33% | 13 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #695 +/- ##
==========================================
- Coverage 84.36% 84.14% -0.23%
==========================================
Files 21 21
Lines 3134 3172 +38
==========================================
+ Hits 2644 2669 +25
- Misses 490 503 +13
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Adding a test for the .json output is blocked by #653
Not looked at the code yet, and this may be part of the JSON output but there's no test to see what it looks like (sorry for banging the same :long_drum: again) but I'm wondering what the utility of knowing the heights without the locations is?
Sure summary statistics could be calculated, but if anyone wanted to plot or visualise the data elsewhere the location of the points would be required.
I have kept the saving in
processing.pyas I plan forrun_topostatsto be generic and not handle extra output from workflows in the future. Feel free to disagree with me on this, but I think maybe we should let workflows save data as they see fit during processing of individual images without them having to return it torun_topostatssince I foresee workflows saving a lot of random stuff that we don't necessarily want to write extra code to handle inrun_topostats?
I agree with this, each module of analysis should save whatever it should/is configured to do at each stage.
No longer blocked by #653
There has been some debate about what format the files should be saved in, and if all traces should be in one file or multiple etc, and so I'm thinking before trying to refine this further, maybe push to main and let people use it for a while to see what issues come up?
Happy to change how this works if people feel strongly on it though. Just this has been delayed due to indecision and also me not being available enough 😅
I do still need to write a test for saving the JSON as is flagged by codecov. Would you suggest breaking this out into a function and having that tested individually instead of having the messy logic loose inside run_dnatracing()?
:+1: Yes, definitely isolate the code into its own testable function. Test with small examples rather than full data set if possible too (even smaller than the cropped picture is likely possible, see examples of tests for write_yaml() in tests/test_io.py::test_write_config](https://github.com/AFM-SPM/TopoStats/blob/43410ab06bf366764271553a16f88d97f3c29999/tests/test_io.py#L116) which uses a dummy [CONFIG`).
Note to self: Make the heights and distances all in one JSON and change molecule index to a string "grain_54".