coremltools Support for deform_conv2d operation from PyTorch

Name of layer type: deform_conv2d
Is this a PyTorch or a TensorFlow layer type: PyTorch
Your version of coremltools: 7.0b1
Your version of PyTorch/TensorFlow: PyTorch 2.0.1
Impact of supporting this layer type. Why is adding support for this layer type important? Is it necessary to support a popular model or use case? Deformable Convolution, as implemented in the torchvision.ops.deform_conv2d operator in PyTorch, is a key technique that allows Convolutional Neural Networks to adapt to complex spatial transformations in input data. It enhances the model's performance in tasks that require understanding spatial hierarchies and relationships, such as object detection, image segmentation, and image restoration. The lack of support for this operation presents a challenge for the conversion of my model.

I was wondering if there are any plans to implement support for the deform_conv2d operation in a future release of CoreML? If support for deform_conv2d is not planned, could you provide any advice or workarounds for dealing with this issue? Any guidance would be greatly appreciated.

Thank you for your time and for the excellent work you do on the CoreML project!

Jun 24 '23 08:06 Volutionn

Thank you for filing this feature request!

Could you provide a minimum code snippet that contains deform_conv2d to reproduce the issue? Thanks!

Meanwhile, I would also recommend to add the support of this op on your end by using the composite operators: https://coremltools.readme.io/docs/composite-operators

Thanks!

Jun 27 '23 17:06 junpeiz

Thank you for your reply!

As requested, here's a minimum code snippet that contains deform_conv2d:

import torch
from torchvision.ops import deform_conv2d
import coremltools as ct

class DeformConv2DModel(torch.nn.Module):
    def __init__(self):
        super(DeformConv2DModel, self).__init__()
        self.kh, self.kw = 3, 3
        self.weight = torch.nn.Parameter(torch.rand(5, 3, self.kh, self.kw))

    def forward(self, x, offset, mask):
        out = deform_conv2d(x, offset, self.weight, mask=mask)
        return out

# Define the model
model = DeformConv2DModel()

# Create a random input tensor
input_tensor = torch.rand(4, 3, 10, 10)
offset = torch.rand(4, 2 * model.kh * model.kw, input_tensor.shape[2] - 2, input_tensor.shape[3] - 2)
mask = torch.rand(4, model.kh * model.kw, input_tensor.shape[2] - 2, input_tensor.shape[3] - 2)

# Trace the model
traced_model = torch.jit.trace(model, (input_tensor, offset, mask))

# Convert to Core ML
coreml_model = ct.convert(
    traced_model,
    inputs=[ct.TensorType(name="input", shape=input_tensor.shape),
            ct.TensorType(name="offset", shape=offset.shape),
            ct.TensorType(name="mask", shape=mask.shape)],
    source='pytorch',
)

Jul 02 '23 13:07 Volutionn

Hello, I was wondering if there's any update regarding the support of the deform_conv2d operation? Thank you!

Aug 28 '23 05:08 Volutionn

same demand for deform_conv2d here! any update?

Oct 30 '23 10:10 Feynman1999

+1

Nov 07 '23 10:11 bitanath

The PyTorch documentation for this op doesn't contains a lot of details. The PyTorch forward implementation for deform_conv2d seems quite complex. Can someone share mathematical formulas for what this operation actually does?

Feb 27 '24 20:02 TobyRoseman

Thank you @TobyRoseman for looking into this.

Based on my understanding of the topic, the PyTorch implementation references the following two papers: Deformable ConvNets Deformable ConvNets v2

I've summarized the formulas below.

Deformable ConvNets:

Standard Convolution: $$y(p_0) = \sum_{p_n \in R} w(p_n) \cdot x(p_0 + p_n)$$ The base convolution operation without deformable adjustments.
Deformable Convolution: $$y(p_0) = \sum_{p_n \in R} w(p_n) \cdot x(p_0 + p_n + \Delta p_n)$$ Enhances standard convolution by adding learnable offsets $\Delta p_n$, adapting to geometric variations.
Bilinear Interpolation for Deformable Convolution: $$x(p) = \sum_{q} G(q,p) \cdot x(q)$$ Here, $G(q, p) = g(q_x, p_x) \cdot g(q_y, p_y)$ and $g(a,b) = \max(0,1-|a-b|)$ for sampling at non-integer locations.

Deformable ConvNets v2:

Modulated Deformable Convolution: $$y(p) = \sum_{k=1}^{K} w_k \cdot x(p + p_k + \Delta p_k) \cdot \Delta m_k$$ Introduces modulation scalars $\Delta m_k$ with learnable offsets, refining the influence of each sampled location.
Modulated Deformable RoI Pooling: $$y(k) = \frac{1}{n_k} \sum_{j=1}^{n_k} x(p_{kj} + \Delta p_k) \cdot \Delta m_k$$ Applies modulation and learnable offsets to RoI pooling for enhanced precision.

Feb 28 '24 11:02 Volutionn

Thanks @Volutionn for the concise information. So the deform_conv2d PyTorch op uses just the "Deformable Convolution" formula, is that correct?

Feb 28 '24 21:02 TobyRoseman

That's what I understand about the deform_conv2d operation in PyTorch: it supports both Deformable Convolution versions 1 and 2, including the modulated formulas. When the mask parameter is None, it performs Deformable Convolution as described in the first Deformable ConvNets paper, utilizing only the "Deformable Convolution" formula. If a mask is provided, it implements Deformable ConvNets v2, which incorporates the modulation formulas. Regarding bilinear interpolation, it is essential in both versions to manage fractional offsets.

Feb 29 '24 00:02 Volutionn

Hello everyone, I tried using deform_conv2d in coremltools v7.1, and got this error:

RuntimeError: PyTorch convert function for op 'torchvision::deform_conv2d' not implemented.

Are there any workarounds? I don't think it's a simple implementation that can be done using a custom layer in MIL Builder. However, in case anyone has some ideas I'm happy to work on it to try and implement. Just need a starting point.

The reason for this is Deform Conv 2d is usually a drop - in replacement that improves loss by 20-30% on the CNN. It's pretty amazing, and would be of great advantage to have supported in deployed CoreML

May 05 '24 05:05 bitanath

Just a bump. Is this issue dead? Please advise.

May 09 '24 10:05 bitanath

@bitanath I imagine it's just a complex operator to add. I tried to implement it using composite operators, but without any success. Hopefully, this hasn't been abandoned on @TobyRoseman's side; I've been hoping for it for almost a year. Agree, it would be amazing to have it. Let's be patient, it's normal that it takes time.

May 13 '24 10:05 Volutionn

@Volutionn I implemented the deform_conv2d operation using CoreML custom layers:

https://github.com/dneprDroid/DeformConv2d-Metal

It's GPU-accelerated and supports both iOS and macOS. You can try the demo app and the example converter script to generate a CoreML model with custom layers.

May 27 '24 13:05 dneprDroid

Wow, you made my day! That's amazing, thanks a lot for sharing it @dneprDroid 🙏🏻

May 27 '24 13:05 Volutionn

Thanks a lot @dneprDroid ! This is pretty neat! I was also trying to implement this in metal from the published CUDA implementation, but it seemed too hard.

Would you also be releasing the code for the shaders, purely for learning purposes? Or did I miss them somewhere.

Regardless, thanks a lot for this library! It's awesome!! Much appreciated ❤️

May 28 '24 06:05 bitanath