Reduce reader memory consumption
It adds a new output_path parameter in a reader's function that allows to save every element of the spatialdata object as soon as it's created.
This frees up part of the memory during the function's execution instead of maintaining the whole spatialdata object in memory and then saving it with sdata.write.
I've done it only for Xenium but if I receive the ok I can implement it for the rest of the readers.
Codecov Report
Attention: Patch coverage is 37.25490% with 32 lines in your changes missing coverage. Please review.
Project coverage is 45.02%. Comparing base (
cff4896) to head (43f0d04). Report is 2 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/spatialdata_io/readers/xenium.py | 37.25% | 32 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #228 +/- ##
==========================================
- Coverage 45.50% 45.02% -0.48%
==========================================
Files 22 22
Lines 2149 2183 +34
==========================================
+ Hits 978 983 +5
- Misses 1171 1200 +29
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/spatialdata_io/readers/xenium.py | 66.18% <37.25%> (-5.61%) |
:arrow_down: |
Thanks for the PR @marcovarrone, I will check it and get back to you.
Closing because it's a bit outdated and dask already uses chunking to adapt the amount of memory required. The solution of saving each element makes a difference only if the reader cannot rely on dask.
I will create smaller and separate pull requests for each reader with appropriate credits for the nice work done by @laudmt, @cgeyskens and @frankandreace during the scverse x Owkin hackathon.
Thanks @marcovarrone for the update, this went out of our radar. I just add a chat with @melonora regarding this and saw the updates. I will follow up on the Xenium PR.