CNTK fails in building a model with the layer `ConvLSTM2D` in some cases
System information
- Have I written custom code (as opposed to using example directory):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 & Linux Ubuntu 18.04
- CNTK version: 2.7
- Python version: 3.6.9
- CUDA/cuDNN version: -
- GPU model and memory: -
Describe the current behavior
CNTK fails in building a model with the layer ConvLSTM2D, the error is shown below (cntk/ops/__init__.py line 421). For detailed parameters of ConvLSTM2D, you can refer to the following code snippet.
<class 'ValueError'>, ValueError('Convolution operation requires that kernel dim 5 <= input dim 4
Analysis for CNTK error
We found this CNTK error is jointly influenced by the input dimension and kernel_size. The input dimension is calculated in keras\backend\cntk_backend.py line 1731, shown as the following picture.

Through the calculation in line 1731, different parameters will get different shapes, as shown in the following table.
kernel_size=4; input_shape=[16,16,8]
| strides | output shape after convolution |
|---|---|
| 1 | [8,13,5] |
| 2 | [8,7,3] |
| 3 | [8,5,2] |
| 4 | [8,4,2] |
| 5 | [8,3,1] |
The output shape after convolution operation will be used in convolution in cntk\ops\__init__.py line 421, shown as the following picture.

In this convolution process, the 3rd dimension value of convolution_map (e.g., 4 here) will compare with the 2nd dimension value of toSequence_Input4(e.g., 13). And the 4th dimension value of convolution_map(e.g., 4 here) will compare with the 3rd dimension value of toSequence_Input4(e.g., 5 here). If the value in convolution_mapis greater than the corresponding value in toSequence_Input4, CNTK will arise an error like ValueError('Convolution operation requires that kernel dim xx <= input dim xx.
As a example, if we set strides=1 and kernel_size=4 or (4,4), the shape of toSequence_Input4 will be [8,13,5], as shown in the above picture. 4 is less than 5 or 13, and the error will not be triggered in this case. Contrastly, If we change the strides=3, the shape of toSequence_Input4 will be [8,5.2]. 4 is greater than 2, and this leads to the error ValueError('Convolution operation requires that kernel dim 4 <= input dim 2.
Key insights
The implementation of ConvLSTM2D on CNTK seems to have flaws. As summarized above, when the value of kernel_size exceeds input dimension, CNTK will crash to build a model.
Additionally, there is no relevant hints or warnings in the documents for this problem, which may confuse the CNTK users.
Application
When I run a MobileNetv2 model ( with ConvLSTM2D) on CNTK, it will arise the error shown in the following picture. However, this model can be normally predicted on Tensorflow and Theano, and get similar outputs.

Code to reproduce the issue
import numpy as np
import keras.layers as L
from keras.engine import Model, Input
## Using CNTK as Keras backend.
## Input dtype default is float32
kwargs = {
'filters': 8,
'kernel_size': 5,
#'kernel_size': (1,5),
'strides':1,
'padding': 'valid',
'data_format': 'channels_first',
'dilation_rate': 1,
'use_bias': True,
'unit_forget_bias': False,
'return_sequences': True,
'go_backwards': False,
'stateful': False,
'dropout': 0.7232577469807254,
'recurrent_dropout': 0.7507926892266159
}
input = (10 * np.random.random((1,16,16,16,8)))
layer = L.convolutional_recurrent.ConvLSTM2D(**kwargs)
x = Input(batch_shape=input.shape)
y = layer(x)
bk_model = Model(x, y)
print('finish')