alibi Specify which dimension contains image channel in `AnchorImage`

In AnchorImage the channel dimension is assumed to be the last dimension as defined here. This is not necessarily true, for example in mnist handwritten pytorch model the shape of the data is (1, 28*28).

Note sure if this is applicable in other explainers as well? so the fix needs to be applied on all relevant explainers.

Lets have this as a configurable parameter at object init?

Sep 24 '21 14:09 sakoush

The easiest solution would be to introduce another argument with a default value, e.g. layout='nhwc', this seems relevant as "image metadata" for explainers (and detectors?) that need this info @arnaudvl @ascillitoe .

Sep 27 '21 09:09 jklaise

Annoyingly, it seems the latest release of scikit-image==0.18.3 doesn't support a different layout other than channel-last, however in the upcoming version 0.19 this is added and can be controlled via an argument. Fixing this in the meantime would require a bit of custom reshaping.

Oct 07 '21 16:10 jklaise

It seems like the 0.19 release of scikit-image should not be too far away now (https://github.com/scikit-image/scikit-image/issues/5169), so it's tempting to wait and upgrade to avoid doing array conversions ourselves just to be removed later. Otherwise we would have to reimplement the logic of this decorator to patch the skimage segmentation functions that can only handle channel-last: https://github.com/scikit-image/scikit-image/blob/8a13f15b7629a3be643fe0eb45f6ede55ba69c84/skimage/_shared/utils.py#L219

Oct 08 '21 09:10 jklaise

Tthere are cases e.g. sklearn mnist that do not even have a channel dimension. Basically the image is flattened for grayscale images to 2D: (samples, features).

Nov 02 '21 09:11 sakoush

Yup, the standard response is to wrap the predictor so it's of the form that alibi expects (i.e. in this case adding a dummy channel dimension).

In this particular case it should be easy to support this internally (e.g. check image_shape, if 2D raise a warning and construct the wrapped predictor internally).

This is actually the same type of issue as #516, attempting to push the complexity of wrapping non-compliant predictors internally as opposed to keeping it a responsibility of the user.

Nov 02 '21 09:11 jklaise

Since scikit-image 0.19 has been released (#551 ), this is unblocked. A fix would then need to bump the minimum required version of scikit-image to 0.19.

Dec 06 '21 12:12 jklaise

I've thought a little about this and there are two main choices, both requiring an extra argument:

image_layout: Literal['nhwc', 'nchw'] = 'nhwc'
- By default have nhwc which is the tensorflow default image shape, also support the other popular layout nchw often used with torch
- Could be extended in the future to include other permutations
channel_axis: int = -1
- By default treat last axis as channel
- For the nchw format channel_axis would have to be 1 (since 0 is for batch and using strings such as 'first' and 'last' might be confusing?)
- Perhaps less flexible than image_layout because this would not be extensible to support more weird layouts (e.g. height and weight axes swapped around)
- channel_axis is the way multi-channel images are treated in scikit-image from version 0.19: https://github.com/scikit-image/scikit-image/releases/tag/v0.19.0

I think image_layout has a slight edge do to being more general and extensible but would be great to hear another opinion @RobertSamoilescu @ascillitoe.

EDIT: it's also possible we stick with conventions like skimage and go the channel_axis route, see more discussion here: https://scikit-image.org/docs/stable/user_guide/numpy_images.html#coordinate-conventions

EDIT2: in the future supporting volumetric images may require even more conventions like skimage that could be more easily managed by image_layout instead

EDIT3: including the batch dimension n in image_layout may be slightly controversial as it's not part of the image layout and also (for the time being) AnchorImage operates only on single images. On the other hand, the implicit assumption then is that the underlying predictor has batch as leading dimension (but this is an assumptions we've made throughout the library).

Jul 21 '22 10:07 jklaise

Mmmn a tricky one. I tend to agree that image_layout is nicer, since it is more flexible and could be extended in the future. I also that channel_axis might be confusing due to the issue with whether the batch dimension n is included in the integer count.

Maybe image_layout could also aid scenarios like (samples, features) that @sakoush mentioned. If numeric values where given i.e. image_layout='n,h32,w32' (we'd need to think about formatting rules) this would help us avoid needing to try to infer anything when reshaping the flattened feature array. Or is this getting too complicated now?!

Jul 21 '22 11:07 ascillitoe

@ascillitoe I would avoid putting in extra functionality, especially something that would be a new custom format. We already have an image_shape argument which is also a standard concept in ML.

Jul 21 '22 12:07 jklaise

Yeh that's fair enough. Even without a need for the added flexibility of image_layout right now, I agree that it is easier to interpret, and might be useful for volumetric images etc in the future. I don't have a super strong opinion though.

Jul 21 '22 14:07 ascillitoe

PyTorch example for AnchorImage.

Sep 26 '22 11:09 RobertSamoilescu