rosettasciio icon indicating copy to clipboard operation
rosettasciio copied to clipboard

Add file handle API

Open ericpre opened this issue 1 year ago • 4 comments

Closes #331.

Progress of the PR

  • [x] Add file_handle API,
  • [x] implement file_handle API for the following format,
    • [x] hspy
    • [x] emd
    • [x] nexus
    • [x] usid
    • [x] tiff
  • [x] update docstring (if appropriate),
  • [x] update user guide (if appropriate),
  • [x] add a changelog entry in the upcoming_changes folder (see upcoming_changes/README.rst),
  • [x] Check formatting of the changelog entry (and eventual user guide changes) in the docs/readthedocs.org:rosettasciio build of this PR (link in github checks)
  • [x] add tests,
  • [x] ready for review.

ericpre avatar Mar 07 '25 16:03 ericpre

Codecov Report

Attention: Patch coverage is 93.75000% with 1 line in your changes missing coverage. Please review.

Project coverage is 87.78%. Comparing base (f94acf7) to head (decb0be). Report is 49 commits behind head on main.

Files with missing lines Patch % Lines
rsciio/_hierarchical.py 66.66% 0 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #375      +/-   ##
==========================================
- Coverage   87.84%   87.78%   -0.06%     
==========================================
  Files          85       85              
  Lines       11246    11257      +11     
  Branches     2084     2087       +3     
==========================================
+ Hits         9879     9882       +3     
- Misses        868      871       +3     
- Partials      499      504       +5     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Mar 07 '25 16:03 codecov[bot]

@CSSFrancis, what do you think about this API?

ericpre avatar Mar 14 '25 18:03 ericpre

@ericpre I really like this change and think it was a long time coming. That being said I think we need to think a little bit about more how we are handling files and what the relationship between hyperspy --> dask --> underlying file. More and more it's seeming like dask not being "file aware" is a bit of a problem.

More than likely what we need is a abstract class which is something like https://docs.xarray.dev/en/latest/generated/xarray.backends.CachingFileManager.html which would allow File Objects to be passed from one process to the next without pickling. This is how xarray handles things like distributed hdf5 processing which would be wonderful :).

To maybe take it a step further, I would like to build a bit more context support for things like rechunking. I think this would involve something like the optimized slicing that b2h5py does:

https://github.com/Blosc/b2h5py/blob/317f4a0c31ac61623bf49861f3f5c6ebce46f633/b2h5py/blosc2.py#L248

Mostly it would just be optimized slicing for things like virtual images, plotting etc.

CSSFrancis avatar Mar 15 '25 18:03 CSSFrancis

@CSSFrancis, thank you, this sounds good.

Regarding this PR, would it be better to put it on hold until a file manager is implemented to return the file manager object instead of the file handle? Even if they may have similar API, there are not the same and therefore can't be swapped without breaking the API?

ericpre avatar Mar 16 '25 10:03 ericpre