Specify which dimension contains image channel in `AnchorImage`
In AnchorImage the channel dimension is assumed to be the last dimension as defined here. This is not necessarily true, for example in mnist handwritten pytorch model the shape of the data is (1, 28*28).
Note sure if this is applicable in other explainers as well? so the fix needs to be applied on all relevant explainers.
Lets have this as a configurable parameter at object init?
The easiest solution would be to introduce another argument with a default value, e.g. layout='nhwc', this seems relevant as "image metadata" for explainers (and detectors?) that need this info @arnaudvl @ascillitoe .
Annoyingly, it seems the latest release of scikit-image==0.18.3 doesn't support a different layout other than channel-last, however in the upcoming version 0.19 this is added and can be controlled via an argument. Fixing this in the meantime would require a bit of custom reshaping.
It seems like the 0.19 release of scikit-image should not be too far away now (https://github.com/scikit-image/scikit-image/issues/5169), so it's tempting to wait and upgrade to avoid doing array conversions ourselves just to be removed later. Otherwise we would have to reimplement the logic of this decorator to patch the skimage segmentation functions that can only handle channel-last: https://github.com/scikit-image/scikit-image/blob/8a13f15b7629a3be643fe0eb45f6ede55ba69c84/skimage/_shared/utils.py#L219
Tthere are cases e.g. sklearn mnist that do not even have a channel dimension. Basically the image is flattened for grayscale images to 2D: (samples, features).
Yup, the standard response is to wrap the predictor so it's of the form that alibi expects (i.e. in this case adding a dummy channel dimension).
In this particular case it should be easy to support this internally (e.g. check image_shape, if 2D raise a warning and construct the wrapped predictor internally).
This is actually the same type of issue as #516, attempting to push the complexity of wrapping non-compliant predictors internally as opposed to keeping it a responsibility of the user.
Since scikit-image 0.19 has been released (#551 ), this is unblocked. A fix would then need to bump the minimum required version of scikit-image to 0.19.
I've thought a little about this and there are two main choices, both requiring an extra argument:
-
image_layout: Literal['nhwc', 'nchw'] = 'nhwc'- By default have
nhwcwhich is thetensorflowdefault image shape, also support the other popular layoutnchwoften used withtorch - Could be extended in the future to include other permutations
- By default have
-
channel_axis: int = -1- By default treat last axis as channel
- For the
nchwformatchannel_axiswould have to be1(since0is for batch and using strings such as'first'and'last'might be confusing?) - Perhaps less flexible than
image_layoutbecause this would not be extensible to support more weird layouts (e.g. height and weight axes swapped around) -
channel_axisis the way multi-channel images are treated inscikit-imagefrom version 0.19: https://github.com/scikit-image/scikit-image/releases/tag/v0.19.0
I think image_layout has a slight edge do to being more general and extensible but would be great to hear another opinion @RobertSamoilescu @ascillitoe.
EDIT: it's also possible we stick with conventions like skimage and go the channel_axis route, see more discussion here: https://scikit-image.org/docs/stable/user_guide/numpy_images.html#coordinate-conventions
EDIT2: in the future supporting volumetric images may require even more conventions like skimage that could be more easily managed by image_layout instead
EDIT3: including the batch dimension n in image_layout may be slightly controversial as it's not part of the image layout and also (for the time being) AnchorImage operates only on single images. On the other hand, the implicit assumption then is that the underlying predictor has batch as leading dimension (but this is an assumptions we've made throughout the library).
Mmmn a tricky one. I tend to agree that image_layout is nicer, since it is more flexible and could be extended in the future. I also that channel_axis might be confusing due to the issue with whether the batch dimension n is included in the integer count.
Maybe image_layout could also aid scenarios like (samples, features) that @sakoush mentioned. If numeric values where given i.e. image_layout='n,h32,w32' (we'd need to think about formatting rules) this would help us avoid needing to try to infer anything when reshaping the flattened feature array. Or is this getting too complicated now?!
@ascillitoe I would avoid putting in extra functionality, especially something that would be a new custom format. We already have an image_shape argument which is also a standard concept in ML.
Yeh that's fair enough. Even without a need for the added flexibility of image_layout right now, I agree that it is easier to interpret, and might be useful for volumetric images etc in the future. I don't have a super strong opinion though.
PyTorch example for AnchorImage.