vision icon indicating copy to clipboard operation
vision copied to clipboard

Deform conv2d mps support

Open goldfishsound opened this issue 1 year ago • 6 comments

DeformConV2 MPS Support

This PR feature a full MPS implementation of the DeformConV2 operator. 

  1. Generel notes For consistency and maintainability, this implementation is in many ways similar to the CPU and CUDA implementations, with the obvious exceptions relating to the difference in their frameworks. The Metal part of the implementation is likewise similar to the implementations for the “ROI” related kernel implementations.

  2. Tests The implementation passes all tests in test_ops.py:TestDeformConv except for the two optests.generate_opcheck_test:     "test_autograd_registration" and     "test_aot_dispatch_dynamic"

    "test_autograd_registration" fails due to the missing MPS dispatch key in torch/testing/_internal/optests/autograd_registration.py I have addressed this issue in a separate PR:  torch.testing._internal.optests - MPS Support #151758

    test_aot_dispatch_dynamic fails due to issues that are not yet clear to me. 

  3. Issues - To be fixed The CPU implementation: deform_conv2d_kernel.cpp is using the in-placed torch operator: .addmm. However deform_conv2d_kernel.mm.  However, for reasons unknown to me, using .addmm in the MPS implementation, returns zero-value tensors after the first iteration in the convolution loop.  As a temporary solution, I have chosen to use the out-of-place version: addmm instead. This is not ideal and should be fixed.

  4. MSL implementation - mps_kernels.h and mps_helpers.h implementations.  The implementation of the bilinear_interpolate function used by the “ROI”-related kernels is different from that used by the CPU and CUDA implementations of the deform_conv2 operator.  Currently, I have chosen to keep both implementations in mps_kernels.h and named the function: bilinear_interpolate_deform_conv2.

  5. Suggestions However, I suggest that mps_kernels.h be split into separate kernels, one for each ”ROI”-related operator, and one for the “deform_conv2” operator. Future implementations of ops should have their own separate kernel files. This will not only be in keeping with the implementation design in Pytorch but also lead to safer and more maintainable code in the long run.  I also suggest that any common utility functions and constants found in mps_kernels.h be moved into mps_helpers.h in the future. Also, maybe consider renaming mps_helpers.h to a more generic name.

goldfishsound avatar Apr 20 '25 13:04 goldfishsound

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9026

Note: Links to docs will display an error until the docs builds have been completed.

:heavy_exclamation_mark: 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot[bot] avatar Apr 20 '25 13:04 pytorch-bot[bot]

Didn't find following labels among repository labels: topic: not user facing

pytorch-bot[bot] avatar Apr 20 '25 13:04 pytorch-bot[bot]

@pytorchbot label "enhancement" @pytorchbot label "module: ops" @pytorchbot label "module: vision"

goldfishsound avatar Apr 20 '25 14:04 goldfishsound

@pytorchbot label "topic: not user facing"

goldfishsound avatar Apr 21 '25 08:04 goldfishsound

Didn't find following labels among repository labels: topic: not user facing

pytorch-bot[bot] avatar Apr 21 '25 08:04 pytorch-bot[bot]

I test it,it is very slow: Running on cpu Time: 0.005s Output: torch.Size([1, 8, 128, 128])

Running on mps Time: 0.292s Output: torch.Size([1, 8, 128, 128]) I built it from source code.

chenjingcheng avatar Apr 25 '25 07:04 chenjingcheng