aicsimageio icon indicating copy to clipboard operation
aicsimageio copied to clipboard

micro-manager 2.0 gamma reader

Open bryantChhun opened this issue 5 years ago • 3 comments

Use Case

The open source microscopy control tool micro-manager supports file saving to single-page .tiffs and ome.tiffs. Additionally, with pycro-manager (ZMQ-based java-python communication and data transfer bridge) there is even greater need to have pythonic data loading/saving.

An extension to this repository to enable easy loading and saving of micro-manager data formats would both broaden the reach of this repo and accelerate micro-manager's integration into python.

Solution

We could build off the Reader ABC, or the TiffReader, or whatever approach makes sense. In particular, it would be useful if such a reader is capable of:

  1. A tiff sequence reader (not just ome.tiff). Micro-manager saves images as one of two types -- ome.tiff or many single page tiffs with metadata XML file. Even if they move to other formats in the future, the backwards compatibility is important.
  2. Extracting instrument/experiment metadata and associated image data. Having named channels associated with the indexed channels would be useful. tifffile has a micromanager_metadata attribute that returns a JSON embedded in the ome.tiff.
  3. Automatic discovery of master-ome.tiff. Micro-manager breaks up individual scenes into multiple files (as is the case in the google drive data above). tifffile can identify the master ome-tiff but does not provide an easy way to intercept this. Internally, we are combing all files in the folder to look for an ome.tiff with the most scenes, and assume this is the master.

Alternatives

An alternative is to enable this type of reading directly into TiffReader/OMETiffReader, rather than make a new reader. The key difference is the amount of accompanying metadata, and whether one should expand TiffReader to check for that.

bryantChhun avatar Jan 30 '21 17:01 bryantChhun

A couple of thoughts:

An extension to this repository to enable easy loading and saving of micro-manager data formats would both broaden the reach of this repo and accelerate micro-manager's integration into python.

Totally agree!

  1. A tiff sequence reader (not just ome.tiff). Micro-manager saves images as one of two types -- ome.tiff or many single page tiffs with metadata XML file. Even if they move to other formats in the future, the backwards compatibility is important.
  2. Extracting instrument/experiment metadata and associated image data. Having named channels associated with the indexed channels would be useful. tifffile has a micromanager_metadata attribute that returns a JSON embedded in the ome.tiff.

Hmmmm we may want to have two variants of it then:

  • One that inherits from my planned GlobReader
  • One that inherits from current OmeTiffReader and returns metadata as a named tuple for interaction like so:
from aicsimageio.readers import MicroManagerReader

r = MicroManagerReader("image.ome.tiff")
r.metadata.ome  # returns OME from base OmeTiffReader
r.metadata.mm  # returns `Dict` from micromanager metadata

This is a similar approach to how CZI wrote their MicromanagerReader except I would use a NamedTuple instead of a Dict for typing purposes.

Sidenote: there is also an argument to bake "Glob" functionality into every reader instead of having it as it's own thing. I just kind of like the simplicity of handling it as a layer above base readers. The other thing I haven't really thought about is how to handle metadata aggregation for a "glob" function. Do we return a list of all the read metadata(s)? Do we try to aggregate the metadata into a single struct?

  1. Automatic discovery of master-ome.tiff. Micro-manager breaks up individual scenes into multiple files (as is the case in the google drive data above). tifffile can identify the master ome-tiff but does not provide an easy way to intercept this. Internally, we are combing all files in the folder to look for an ome.tiff with the most scenes, and assume this is the master.

I think this would be the hardest part of supporting this. It would be great if there was a file naming convention for this. I.e. all files are the same name except for the main file which is tagged with ***.mm-main.ome.tiff so you might have scene-1.ome.tiff, scene-2.ome.tiff, and then all-scenes.mm-main.ome.tiff but that's not really on this library that's upstream.

evamaxfield avatar Jan 30 '21 20:01 evamaxfield

This is a similar approach to how CZI wrote their MicromanagerReader except I would use a NamedTuple instead of a Dict for typing purposes.

That is my teammate (small distinction, we are part of CZBiohub not CZInitiative). He's the one who brought your repo to my attention :-). Our goals are to both develop a broadly useful Micromanager Reader and an extension of it for our group's specific use. The latter part is a discussion point and may not be necessary.

Sidenote: there is also an argument to bake "Glob" functionality into every reader instead of having it as it's own thing. I just kind of like the simplicity of handling it as a layer above base readers. The other thing I haven't really thought about is how to handle metadata aggregation for a "glob" function. Do we return a list of all the read metadata(s)? Do we try to aggregate the metadata into a single struct?

Good question regarding metadata handling. There's already one complication in that, ome-metadata is encoded as xml and micromanager is as JSON. Maybe this is not a big deal, but it suggests that aggregating metadata into a single struct could be a real pain. Are there robust tools to, say, translate xml and json into dicts?

I think this would be the hardest part of supporting this. It would be great if there was a file naming convention for this.

Tifffile logs warnings when it identifies non-masters. I have a hard time catching the logs using the logging api, but it's theoretically possible. Here's how tifffile logs them:

if element.tag.endswith('BinaryOnly'):
    # TODO: load OME-XML from master or companion file
    log_warning('OME series: not an ome-tiff master file')
    break

bryantChhun avatar Feb 01 '21 18:02 bryantChhun

FWIW, regarding automatic ome-tiff master discovery, the above code snippet works well. Querying tiff.scenes was terribly inefficient in that scenes will comb through the whole file for the page locations, even if it's not the ome-tiff master. This is not useful in certain micro-manager cases where the data is split into multiple files.

Here's the exact snippet I used:
def tag_search(root_, tag_name='BinaryOnly'):
    """
    returns True if tag_name is present
    """
    for element in root_:
        if element.tag.endswith(tag_name):
            print(f'OME series: not an ome-tiff master file')
            return True
    return False

for file in os.listdir(folder_):
    if not file.endswith('.ome.tif'):
        continue
    with TiffFile(os.path.join(folder_, file)) as tiff:
        print(f"checking {file} for ome-master records")
        omexml = tiff.pages[0].description

        # get omexml root from first page
        try:
            root = etree.fromstring(omexml)
        except etree.ParseError as exc:
            try:
                omexml = omexml.decode(errors='ignore').encode()
                root = etree.fromstring(omexml)
            except Exception as ex:
                print(f"Exception while parsing root from omexml: {ex}")

        # search for tag corresponding to non-ome-tiff files
        if not tag_search(root, "BinaryOnly"):
            ome_master = file
            break
        else:
            continue

bryantChhun avatar Feb 10 '21 02:02 bryantChhun

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Apr 01 '23 01:04 github-actions[bot]