coremltools Pytorch model with flexible image size

🐞Describe the bug

Unable to make flexible input size when converting from pytorch. I think it's related to https://github.com/apple/coremltools/issues/890 https://github.com/apple/coremltools/issues/880 and https://github.com/apple/coremltools/issues/756. I extend this to input ImageType and make simplest script to reproduce whole process.

The way with RangeDim to make input flexible from documentation just do not work at all! Getting:

File "/Users/user/miniconda3/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 1336, in _get_scales_from_output_size
    scales_h = (output_size[0] + 1e-4) / float(input_shape[-2])
TypeError: unsupported operand type(s) for +: 'NoneType' and 'float'

Just the same network CAN work with flexible input size if sizes are enumerated OR MLMultiArray is used.

So the model is correct and framework do able to execute it with any size but only if enumerated in advance. This brings me to thoughts there's some flag or typo which ignores input range.

Trace

in predict
    return self.__proxy__.predict(data, useCPUOnly)
RuntimeError: {
    NSLocalizedDescription = "Error binding image input buffer input.";
}

To Reproduce

def env_info():
    import sys
    print('python version:', sys.version)
    import torch
    print('PyTorch version:', torch.__version__)
    import coremltools
    print('coremltools version:', coremltools.__version__)

# make a simpliest super-resolution model
def export():
    print('='*24, 'export', '='*24)
    import torch
    from torch import nn
    from torch.nn import functional as F

    class SuperNet(nn.Module):
        def __init__(self):
            super().__init__()
        def forward(self, x):
            return F.interpolate(x, scale_factor=(2, 2))

    model = SuperNet().eval()
    # doesn't matter that is the image size here, can be just anything
    traced = torch.jit.trace(model, torch.rand((1, 3, 1, 1)))
    torch.jit.save(traced, 'traced.pt')

INPUT_NODE = 'input'
OUTPUT_NODE = '30'

MODEL_DEFAULT_INPUT_SIZE = (256, 256)
IMAGE_SIZE = (512, 512)
def convert():
    global OUTPUT_NODE

    print('='*24, 'convert', '='*24)
    import torch
    import coremltools as ct
    from coremltools.models.neural_network import flexible_shape_utils
    import coremltools.proto.FeatureTypes_pb2 as ft 
    from coremltools.models.neural_network.builder import NeuralNetworkBuilder

    traced = torch.jit.load('traced.pt')

    # The way from the documentation just throw: TypeError: unsupported operand type(s) for +: 'NoneType' and 'float'
    # input_shape = ct.Shape(shape=[1, 3, ct.RangeDim(256, 1024, 256, 'w'), ct.RangeDim(256, 1024, 256, 'h')])

    # Ok, lets convert with fixed shape and add flexibility later
    input_shape = ct.Shape(shape=[1, 3, 256, 256])
    converted = ct.convert(
        traced,
        inputs=[ct.ImageType(name=INPUT_NODE, shape=input_shape)])

    # doesn't matter how to get spec, result the same = crash
    # converted.save('coreml.mlmodel')
    # spec = ct.utils.load_spec('coreml.mlmodel')

    spec = converted.get_spec()
    OUTPUT_NODE = spec.description.output[0].name
    
    # make input flexibility
    size_range = flexible_shape_utils.NeuralNetworkImageSizeRange()
    size_range.add_height_range((128, 512)) # tried with upper_bound=-1, doesn't work
    size_range.add_width_range((128, 512))
    flexible_shape_utils.update_image_size_range(spec, INPUT_NODE, size_range=size_range)

    # The only working way to get flexible sizes is enumerate them all
    # but it doesn't make much sense for super-resulution task in our case
    # flexible_shape_utils.add_enumerated_image_sizes(spec, INPUT_NODE, sizes=[
    #     flexible_shape_utils.NeuralNetworkImageSize(x, x) for x in [128, 256, 512]
    # ])

    # make output as an image + flexibility
    feature = flexible_shape_utils._get_feature(spec, OUTPUT_NODE)    
    feature.type.imageType.colorSpace = ft.ImageFeatureType.RGB
    # can specify output size but it doesn't change anything
    # feature.type.imageType.width = 512
    # feature.type.imageType.height = 512

    size_range = flexible_shape_utils.NeuralNetworkImageSizeRange()
    size_range.add_height_range((256, 1024))
    size_range.add_width_range((256, 1024))
    flexible_shape_utils.update_image_size_range(spec, feature_name=OUTPUT_NODE, size_range=size_range)

    # print our model
    builder = NeuralNetworkBuilder(spec=spec)
    print('inputs')
    builder.inspect_input_features()
    print('model')
    builder.inspect_layers(verbose=True)
    print('outputs')
    builder.inspect_output_features()

    updated = ct.models.MLModel(spec)
    updated.save('coreml.mlmodel')
    

def test():
    print('='*24, 'test', '='*24)
    import coremltools as ct
    import PIL
    from PIL import Image
    import numpy as np

    print('input image size=', IMAGE_SIZE)
    arr = np.zeros([IMAGE_SIZE[0], IMAGE_SIZE[1], 3], dtype=np.uint8)
    img = Image.fromarray(arr)

    model = ct.models.MLModel('coreml.mlmodel')
    res = model.predict({INPUT_NODE: img})[OUTPUT_NODE]
    print('result image size=', res.size)

env_info()
export()
convert()
test()

System environment (please complete the following information):

coremltools version: 4.0
OS: MacOS
macOS: Big Sur 11.0 Beta (20A5395g)
XCode version: 12.0.1 (12A7300)
How you install python: anaconda
python version: 3.8.3
PyTorch version: 1.6.0

This is critical bug!

Because some models just do not make much sense with fixed (even several enumerated) size!

I will be happy to provide any additional info required. Please, any help/workaround is appreciated!

Oct 18 '20 20:10 gordinmitya

it neither works when I do not touch output node. (don't specify as image type, set flexible size)

Oct 18 '20 20:10 gordinmitya

@aseemw tested with MultiArray input with dynamic size – it work just perfect! (while keeping output as image)

size_l = 128
size_u = 2048
flexible_shape_utils.set_multiarray_ndshape_range(spec, INPUT_NODE, [1,3,size_l,size_l], [1,3,size_u,size_u])

here's updated script: https://gist.github.com/gordinmitya/96a1b041bea18add4ec2e31907d11100

Any ideas?

Oct 22 '20 16:10 gordinmitya

I've exactly the same problem and I see that coreml runtime basically consider only the imageType width and height values and ignore the imageSizeRange. I tried also to remove the imageType width and height values. Apparently the model is exported successfully this way but the runtime then fail to compile the model.

These are the input/output spec for my model:

(name: "image"
 type {
   imageType {
     width: 1024
     height: 1024
     colorSpace: RGB
     imageSizeRange {
       widthRange {
         lowerBound: 256
         upperBound: 1024
       }
       heightRange {
         lowerBound: 256
         upperBound: 1024
       }
     }
   }
 },
 name: "460"
 type {
   imageType {
     width: 1024
     height: 1024
     colorSpace: RGB
     imageSizeRange {
       widthRange {
         lowerBound: 256
         upperBound: 1024
       }
       heightRange {
         lowerBound: 256
         upperBound: 1024
       }
     }
   }
 })

As said any image with a size different than 1024x1024 fails with "coreml Error binding image input buffer image: -7" message

Oct 28 '20 22:10 JacopoMangiavacchi

I appear to be having the same issue #992

I was able to work around the bug by using MultiArray input and flexible shape image output (as you mention). I am converting my input UIImage to MLMultiArray using Accelerate vImageConvert_ARGB8888toPlanarF. The overhead of the conversion is quite low for my purposes.

Obviously not as ideal as flexible input images just working, but for me at least it is a working solution until these bugs are (hopefully) fixed and at least I can move on for now.

Nov 13 '20 20:11 3DTOPO

@3DTOPO Thank you for workaround with Accelerate! How do you think is it the way CoreML convert UIImage internally? And could you provide small snippet how to use vImageConvert_ARGB8888toPlanarF in Swift?

Nov 16 '20 09:11 gordinmitya

You're welcome! Possibly. Please see the end of this thread for the solution: https://github.com/hollance/CoreMLHelpers/issues/5#issuecomment-726021906

Nov 16 '20 09:11 3DTOPO

Hi, Apologies if I'm hijacking.

If you use enumerated shapes does it work with ANE @gordinmitya ? I got it working with enumerated shape on a trivial/minimal network and while CPU & GPU work fine, ANE still refuses.

Nov 17 '20 17:11 alexrkr

@alexrkr haven't got so far. Need to run the model at least somehow. So in your case model works on ANE when converted with fixed size but use CPU/GPU when converted with enumerated size, right?

Nov 17 '20 20:11 gordinmitya

Exactly, so when exporting with enumerated sizes it works fine on CPU/GPU but not on ANE. Fixed size works on all devices fine. I was hoping to find a reference in Apple's docs or someone who has gotten ANE to work with flexible input network, I'm assuming that's possible.

Nov 17 '20 21:11 alexrkr

Mitya, did you find solution? I also can't use flexible input in my app, it returns Error binding image input buffer image: -7 Many thanks!

Jan 16 '21 12:01 skliarovartem

I'm trying to wrap up development of an update that I've spent 2 years working on. Is this glaring bug ever going to be addressed?

Otherwise I am facing shipping a product with a horrendous work around for a feature that is supposed to be supported. I can't express how frustrating this issue is and one of the most critical toolchains for my app development.

Mar 30 '21 21:03 3DTOPO

@aseemw tested with MultiArray input with dynamic size – it work just perfect! (while keeping output as image)
size_l = 128
size_u = 2048
flexible_shape_utils.set_multiarray_ndshape_range(spec, INPUT_NODE, [1,3,size_l,size_l], [1,3,size_u,size_u])
here's updated script: https://gist.github.com/gordinmitya/96a1b041bea18add4ec2e31907d11100

Any ideas?

HI! I can't get this to work either. Is there any other hack?

Dec 26 '21 14:12 RahulBhalley