PyTorch to CoreML via convert() in v4.0b3 has several bugs with Flexible Input Shapes, seqLen and nFeatures swapped?
Objective:
Advise the coremltools converter that the sequence length of the input is variable size. Useful in the context of LSTM, Transformers and other setups.
Reproducible:
Yes, all issues are reproduced in the test case, annotated and explained below, and in the logfile.
Summary:
BUG 1, General, the mlmodel spec shows that we have a duplicated shape (1,4,5) in the .mlmodel with ct.EnumeratedShapes(shapes= [ (1,4,5), (1,3,5) ] ) See also issue #756 BUG 2: TransformerEncoder, ct.EnumeratedShapes(shapes= [ (1,4,5), (1,3,5) ] ) is parsed or swapped in conversion, should work BUG 3: TransformerEncoder, ct.EnumeratedShapes(shapes= [ (1,3,5), (1,3,6) ] ) is parsed or swapped in conversion, should fail BUG 4: TransformerEncoder, ( 1, ct.RangeDim(2,10), 5 ) is parsed or swapped in conversion, should work BUG 5: TransformerEncoder, ( 1, 3, ct.RangeDim(2,10) ) is parsed or swapped in conversion, should fail
Details below.
Ground truth is here https://coremltools.readme.io/docs/flexible-inputs
Testcase:
Yes, testFlexibleShape.aug26.txt
Run as: python3 testFlexibleShape.aug26.py
This testcase runs through two different PyTorch models (LSTM, TransformerEncoder) which accepts variable length input, and test against five different input shapes in the ct.convert() Some fixed shapes, some enumerated, some rangeDims.
We test five input shapes
- Fixed shape (1,3,5)
- inputShape = ct.EnumeratedShapes(shapes= [ (1,4,5), (1,3,5) ] )
- inputShape = ct.EnumeratedShapes(shapes= [ (1,3,5), (1,3,6) ] )
- inputShape = ( 1, ct.RangeDim(2,10), 5 )
- inputShape = ( 1, 3, ct.RangeDim(2,10) )
The first five tests (0,1,2,3,4) are done with the LSTM and the latter five (5,6,7,8,9) with TransformerEncoder.
Five bugs in total I think.
Setup:
macOS Catalina Python version : 3.7.6 (v3.7.6:43364a7ae0, Dec 18 2019, 14:18:50) [Clang 6.0 (clang-600.0.57)] Torch version : 1.6.0 CoreML tools version : 4.0b3
Log:
Log file is attached here. log.torch.1.6.0.txt
Interpretation and bug lists and issues for setup 0 to 9 :
0 -------------------------------------------------- TEST= 0 inputShape type = 0 LSTM, Expected to PASS, Convert PASS, Predict PASS
No BUG. This is basic behavior with fixed input shape as a sanity test on sequence Length 3, with 5 input features.
1 -------------------------------------------------- TEST= 0 inputShape type = 1 LSTM Expected to PASS, Convert PASS, Predict PASS
One BUG,
Behavior correct but BUG 1, the mlmodel spec shows that we have a duplicated shape (1,4,5) in the .mlmodel 2 -------------------------------------------------- TEST= 0 inputShape type = 2 LSTM Expected to FAIL, Convert FAIL, no predict, looks ok.
No BUG.
We get error ValueError: Incorrect weight matrix: hidden dim size mismatch. Provided (12, 28). Expecting <b, 4DIRECTIONH> but that makes sens as the inputShape is ct.EnumeratedShapes(shapes= [ (1,3,5), (1,3,6) ] ) and the LSTM here cannot process vectors [1 x 6] just [1 x 5].
3 -------------------------------------------------- TEST= 0 inputShape type = 3 LSTM Expected to PASS, Convert PASS, Predict PASS, no BUG
No BUG. with this RangeDim on sequenceLength.
4 -------------------------------------------------- TEST= 0 inputShape type = 4 LSTM Expected to FAIL, Convert FAIL, Predict FAIL, no BUG
No BUG.
We get error ValueError: Incorrect weight matrix: hidden dim size mismatch. Provided (12, 28). Expecting <b, 4DIRECTIONH> but that makes sens as the inputShape is ( 1, 3, ct.RangeDim(2,10) ) and the LSTM here can process input vectors [1 x 5] but not [1 x N]
5 -------------------------------------------------- TEST= 1 inputShape type = 0 TransformerEncoder Expected to PASS, Convert PASS, Predict PASS, no BUG
No BUG. This is basic behavior TransformerEncoder with fixed input shape as a sanity test on sequence Length 3, with 5 input features.
6 -------------------------------------------------- TEST= 1 inputShape type = 1 TransformerEncoder Expected to PASS, Convert() is PASS which is ok, Predict FAIL BUG
As the inputShape is now inputShape = ct.EnumeratedShapes(shapes= [ (1,4,5), (1,3,5) ] )
BUG 2:
First we see:
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/coremltools/models/model.py:119: RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was: Error compiling model: "compiler error: Espresso exception: "Invalid blob shape": generic_elementwise_kernel: cannot broadcast [5, 4, 1, 1, 1] and [5, 3, 1, 1, 1]".
but the mlmodel spec in logfile looks ok actually.
The prediction fails with
RuntimeError: Error compiling model: "compiler error: Espresso exception: "Invalid blob shape": generic_elementwise_kernel: cannot broadcast [5, 4, 1, 1, 1] and [5, 3, 1, 1, 1]".
In a nutshell, the TransformerEncoder with flexible input shape ct.EnumeratedShapes(shapes= [ (1,4,5), (1,3,5) ] ) SHOULD WORK because we only change the sequence length (4 vs 3), and not the number of input features (5) (or d_model= )
7 -------------------------------------------------- TEST= 1 inputShape type = 2 TransformerEncoder Expected to FAIL, Convert PASS which is a BUG, Predict FAIL ok
As the inputShape is now inputShape = ct.EnumeratedShapes(shapes= [ (1,3,5), (1,3,6) ] ) we tell convert() that we can have sequenceLength==3 and numInputFeatures 5 or 6.
That makes no sense as the TransformerEncoder is hardwired to d_model=5 == number of input features
BUG 3:
HOWEVER, the conversion passes but it should fail.
Speculation: it looks like the coremltools flip/swap some input arguments like seqLen and nFeatures for TransformerEncoder but not LSTM
8 -------------------------------------------------- TEST= 1 inputShape type = 3 TransformerEncoder Expected to PASS, Convert PASS ok, Predict FAIL which is a bug
As the inputShape is now inputShape = ( 1, ct.RangeDim(2,10), 5 ) we tell the TransformerEncoder that we can get sequences in range [2 - 10] and keep the nInputFeatures to 5 == d_model
While the model spec looks good, we fail in prediction
BUG 4:
Traceback (most recent call last):
File "testFlexibleShape.aug26.py", line 101, in
9 -------------------------------------------------- TEST= 1 inputShape type = 4 TransformerEncoder Expected to FAIL, Convert PASS which is a BUG, Predict FAIL ok
As the inputShape is now inputShape = ( 1, 3, ct.RangeDim(2,10) ) we tell the TransformerEncoder that we can get number of input features in range [2 - 10] and keep the sequence length to 3.
BUG 5:
That should fail as such a dynamic model with variable input dimensions is not possible. The model convert() ONLY warns, but should fail actually!
Speculation: it looks like the coremltools flip/swap some input arguments like seqLen and nFeatures for TransformerEncoder but not LSTM
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/coremltools/models/model.py:119: RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was: Error compiling model: "compiler error: Espresso exception: "Invalid blob shape": generic_elementwise_kernel: cannot broadcast [2, 3, 1, 1, 1] and [5, 3, 1, 1, 1]".
Traceback (most recent call last):
File "testFlexibleShape.aug26.py", line 101, in