What is performance in comparison with original implementation?
Great implementation. Could you provide the reproduce result that can use to compare with original implementation in CAFFE2? Thanks
I think the first conv should be conv2d. Am I right? The correct version likes
self.spatial_conv = nn.Conv2d(in_channels, intermed_channels, kernel=3,
stride=1, padding=1, bias=bias)
self.bn = nn.BatchNorm2d(intermed_channels)
self.relu = nn.ReLU()
self.temporal_conv = nn.Conv3d(intermed_channels, out_channels, temporal_kernel_size,
stride=temporal_stride, padding=temporal_padding, bias=bias)
I think it is okay. It should be kept as conv3d. but it actually performs like conv2d because one of kernel size is 1.
self.conv3 = SpatioTemporalResLayer(64, 128, 3, layer_sizes[1], block_type=block_type, downsample=True) why downsample=True?input size = 64 output size =128,I can't understand.can you help me ? Thanks! @irhum
My finding is that it's actually slower than C3D with fp16. With fp32, R2+1D is faster.
pytorch 1.3 cuda 10.2 cudnn 7.6.5
I think the newer cudnn is quite efficient in performing 3D convolution for fp16 inputs.