Add file handle API
Closes #331.
Progress of the PR
- [x] Add
file_handleAPI, - [x] implement
file_handleAPI for the following format,- [x] hspy
- [x] emd
- [x] nexus
- [x] usid
- [x] tiff
- [x] update docstring (if appropriate),
- [x] update user guide (if appropriate),
- [x] add a changelog entry in the
upcoming_changesfolder (seeupcoming_changes/README.rst), - [x] Check formatting of the changelog entry (and eventual user guide changes) in the
docs/readthedocs.org:rosettasciiobuild of this PR (link in github checks) - [x] add tests,
- [x] ready for review.
Codecov Report
Attention: Patch coverage is 93.75000% with 1 line in your changes missing coverage. Please review.
Project coverage is 87.78%. Comparing base (
f94acf7) to head (decb0be). Report is 49 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| rsciio/_hierarchical.py | 66.66% | 0 Missing and 1 partial :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #375 +/- ##
==========================================
- Coverage 87.84% 87.78% -0.06%
==========================================
Files 85 85
Lines 11246 11257 +11
Branches 2084 2087 +3
==========================================
+ Hits 9879 9882 +3
- Misses 868 871 +3
- Partials 499 504 +5
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@CSSFrancis, what do you think about this API?
@ericpre I really like this change and think it was a long time coming. That being said I think we need to think a little bit about more how we are handling files and what the relationship between hyperspy --> dask --> underlying file. More and more it's seeming like dask not being "file aware" is a bit of a problem.
More than likely what we need is a abstract class which is something like https://docs.xarray.dev/en/latest/generated/xarray.backends.CachingFileManager.html which would allow File Objects to be passed from one process to the next without pickling. This is how xarray handles things like distributed hdf5 processing which would be wonderful :).
To maybe take it a step further, I would like to build a bit more context support for things like rechunking. I think this would involve something like the optimized slicing that b2h5py does:
https://github.com/Blosc/b2h5py/blob/317f4a0c31ac61623bf49861f3f5c6ebce46f633/b2h5py/blosc2.py#L248
Mostly it would just be optimized slicing for things like virtual images, plotting etc.
@CSSFrancis, thank you, this sounds good.
Regarding this PR, would it be better to put it on hold until a file manager is implemented to return the file manager object instead of the file handle? Even if they may have similar API, there are not the same and therefore can't be swapped without breaking the API?