mlpack icon indicating copy to clipboard operation
mlpack copied to clipboard

Adapt `TransposedConvolution` Layer

Open ranjodhsingh1729 opened this issue 6 months ago • 6 comments

Resolves: https://github.com/mlpack/mlpack/issues/3959

Description:

This PR updates the TransposedConvolution layer to mlpack’s new layer interface.

Changes

As suggested by @rcurtin, I followed a mostly straightforward translation to the new API.

I’ve added some tests — some ported from the legacy ann_layer_test.cpp, others adapted from convolution layer tests. All tests are currently passing, except ones which depend on the remaining work.

Remaining Work

  • ~Add support for outputPadding~
  • ~Implement alignment logic (aW and aH)~
  • Handle output cropping when padding becomes negative (e.g., in 'same' padding cases)

Feedback on the current direction would be much appreciated.

ranjodhsingh1729 avatar Jul 12 '25 16:07 ranjodhsingh1729

Hi,

I’m wrapping up the last pieces of this PR for the transposed convolution layer and had a few open questions where I could use your input:

  1. Output Padding Older versions didn’t support output padding, but one of the existing tests seems to expect it. Do we want to support this officially now? And if so, should that be part of this PR?

  2. "Same" Padding with Stride > 1 When paddingType == "same" and stride > 1, the computed padding can go negative, meaning the output sometimes needs to be cropped to match the input. Should we handle that explicitly?

  3. Alignment Parameters (aW, aH) Earlier implementations used aW and aH, based on user-specified outputSize. Since we now compute the output size internally, these values always end up as zero. Is it safe to remove them?

I initially planned to revisit these over the weekend, but they’re starting to feel out of scope—especially since the old code didn’t really address them either. Would love your thoughts on the best way to move forward!

Thanks in advance!

ranjodhsingh1729 avatar Jul 20 '25 09:07 ranjodhsingh1729

Sorry for the slow response @ranjodhsingh1729. I kickstarted the rest of the CI and restarted the cross-compilation job (hopefully that will succeed, if not, it's probably not your fault. I've been fighting one of those systems for quite a while now).

Output Padding Older versions didn’t support output padding, but one of the existing tests seems to expect it. Do we want to support this officially now? And if so, should that be part of this PR?

Which is the test that seems to expect output padding? As far as I can tell, outside of the tests, output padding is not required anywhere (I am looking in the models and examples repository). So my inclination would be to leave it out and match the padding strategy and parameter names of the regular convolution layer (input only).

"Same" Padding with Stride > 1 When paddingType == "same" and stride > 1, the computed padding can go negative, meaning the output sometimes needs to be cropped to match the input. Should we handle that explicitly?

I have not looked at the implementation yet, but I think that we should handle that case, yes. Perhaps there is more trickiness than I am thinking at this moment, but at a first glance it seems like this should not be so hard to handle.

Alignment Parameters (aW, aH) Earlier implementations used aW and aH, based on user-specified outputSize. Since we now compute the output size internally, these values always end up as zero. Is it safe to remove them?

Yes, the user should not need to specify the output sizes---that should now be computed based on the input size and the kernel size (and the padding size). So if the result is that aW and aH are always zero, then I don't see any need to keep them.

I can't promise a fast review here (there is just too much else going on) but I will come back to this when I'm able to. I definitely am interested in an updated TransposedConvolution layer since that will re-enable the mnist_cnn_vae example in the examples repository (https://github.com/mlpack/examples).

rcurtin avatar Jul 30 '25 13:07 rcurtin

I have not looked at the implementation yet, but I think that we should handle that case, yes. Perhaps there is more trickiness than I am thinking at this moment, but at a first glance it seems like this should not be so hard to handle.

I am currently getting memory related errors while trying to use this in an autoencoder (BTW good to know there is one i can adapt from examples repository). I'll definitely try to handle this once everything else is working smoothly.

I can't promise a fast review here (there is just too much else going on) but I will come back to this when I'm able to. I definitely am interested in an updated TransposedConvolution layer since that will re-enable the mnist_cnn_vae example in the examples repository (https://github.com/mlpack/examples).

No problem.

ranjodhsingh1729 avatar Jul 30 '25 14:07 ranjodhsingh1729

When paddingType == "same" and stride > 1, the computed padding can go negative, meaning the output sometimes needs to be cropped to match the input. Should we handle that explicitly?

I have not looked at the implementation yet, but I think that we should handle that case, yes. Perhaps there is more trickiness than I am thinking at this moment, but at a first glance it seems like this should not be so hard to handle.

While ~it may not be as tricky~ :), it may be simpler handling this in padding layer. I personally am not sure if this "should be" handled in padding layer. What do you think?

Edit:- Initially i thought one could just crop the output to match the input. Now when revisiting this work i realized that if i crop the output of the forward and the backward gets the cropped output it would affect its dimensions and then the gradient will be affected and it will be a mess. So, i looked for a solution as to how other libraries handle this. I found that Tensorflow's implementations of same padding make sure output = stride * input rather than output = input (size). I put this in the equations and it all works out with no possibility of negative padding ( unless filter size is greater than the input ). Now I can only see only two reasonable solutions: don't support same padding for stride > 1 OR do what tensorflow does.

If we really need to crop the output in this case. I'll try to figure that out but i don't see a good approach yet. Any suggestions?

ranjodhsingh1729 avatar Aug 02 '25 13:08 ranjodhsingh1729

When paddingType == "same" and stride > 1, the computed padding can go negative.

For now, the Transposed Convolution layer throws an error in this case until we decide on a proper solution.

ranjodhsingh1729 avatar Sep 20 '25 17:09 ranjodhsingh1729

Hi @nzachary! Sorry for the late response and thanks for the suggestions. I've made the necessary changes (corresponding changes in #3988 in convolution_impl.hpp made it really straightforward). Also, #3988 was cool work I saw the difference in speed between Naive and im2col.

There are still some things pending (add few more tests, fix that padding issue, ...). I'll try get them done just after the exams.

Thanks again!

ranjodhsingh1729 avatar Dec 08 '25 11:12 ranjodhsingh1729

Script for verifying test case:- https://gist.github.com/ranjodhsingh1729/48f28648187fd4eed7d30c95069808f7

ranjodhsingh1729 avatar Dec 13 '25 06:12 ranjodhsingh1729