nncf TF2.8 support

This is same as andrey-churkin 's #1152 with some resolving merge error.

Changes

The Keras code was moved to the separate GitHub repository keras-team/keras since TF 2.6.0. The API endpoints for tf.keras stay unchanged. So, all imports from the tensorflow.python.keras are deprecated for TF >= 2.6.0. These imports were replaced by imports from tensorflow.keras for Public Keras API and by imports from keras for Private API.
For backward compatibility with TF versions, which are older than 2.6.0, the tf_internals module was created. It contains imports from the TF/Keras private API. For TF < 2.6 the tensorflow.python.keras endpoint should be used for imports from the Keras private API. For TF >= 2.6 the keras endpoint should be used. The endpoint for imports from private TF API was not changed.
The attribute map_vectorization was removed from the tf.data.experimental.OptimizationOptions since TF 2.7. For backward compatibility with TF version, which are older than 2.7.0, the additional condition was added to the examples.tensorflow.classification.datasets.builder.DatasetBuilder class.
The TF 2.8.0 depends on numpy >= 1.20. For this reason, numpy version was updated.
The code of the tf.keras.applications.mobilenet.MobileNet model was updated since TF 2.6. For this reason, additional condition was added to the tests.tensorflow.test_models.mobilenet module.
The reference graphs for TF 2.5 and TF 2.8.0 are different. The new NoOp node was added to the graphs for TF 2.8. This node is used as a placeholder for control edges. Additional step for TF 2.8 was added to the tests which remove this node. Reference graphs for TF 2.5 and TF 2.8.0 are the same now.
The tf.ones() method was changed in the TF 2.8.0 (additional TF operations appear in the TF graph). For this reason, reference graph for Mask R-CNN was updated for TF 2.8.0.

Reason for changes

These changes are required to support TensorFlow 2.8.0 version.

Related tickets

Ref: 79015 Ref: 79017

Tests

The symbolic links to reference_graphs/2.5 were added to thereference_graphs/2.8. Only graphs for MobileNet and Mask R-CNN were updated. The reasons were highlighted above.

Jul 07 '22 03:07 dongkwan-kim01

@negvet, Would you please give us advice to confirm which tests should be validated to merge this PR?

My thought is

precommit tests
tests/tensorflow/test_sanity_sample.py
tests/tensorflow/test_weekly.py
tests/tensorflow/test_sota_checkpoints.py

Do we need more?

Jul 07 '22 05:07 vinnamkim

All the comments applied @vinnamkim

Jul 07 '22 06:07 dongkwan-kim01

@negvet, Would you please give us advice to confirm which tests should be validated to merge this PR?

My thought is

precommit tests

tests/tensorflow/test_sanity_sample.py

tests/tensorflow/test_weekly.py

tests/tensorflow/test_sota_checkpoints.py

Do we need more?

@dongkwan-kim01 Our pre-commit tests are all green. For the next step, I would like you to report the results of the above tests we discussed. Documentation for the roles of these tests should also be included.

Jul 12 '22 22:07 vinnamkim

@negvet, Would you please give us advice to confirm which tests should be validated to merge this PR? My thought is

precommit tests

tests/tensorflow/test_sanity_sample.py

tests/tensorflow/test_weekly.py

tests/tensorflow/test_sota_checkpoints.py

Do we need more?

@dongkwan-kim01 Our pre-commit tests are all green. For the next step, I would like you to report the results of the above tests we discussed. Documentation for the roles of these tests should also be included.

Yes I'll attach the test results, and the documentation also be included when it's ready.

Jul 13 '22 00:07 dongkwan-kim01

Thanks for your contribution! Could you provide in the ticket a summary of the nightly and weekly tests for this PR?

Yes, I'll attach the weekly and nightly test to the ticket when it's done, but currently I'm troubleshooting the version problems that occur during weekly and nightly tests. I'll provide the results when it's done. Thank you!

Jul 26 '22 07:07 dongkwan-kim01

Results for the evaluation:

Result for the openvino IR model evaluation(Mask model AC not supported):

Model	AC Measured	Reference FP32(OV)	Diff FP32	within Threshold
inception_v3_imagenet_rb_sparsity_int8	77.24%	78.03%	-0.79%	O
retinanet_coco_int8	33.16%	33.33%	-0.17%	O
mask_rcnn_coco_int8	-	-	-	-

-> meets within threshold(-1.0%)

Aug 09 '22 19:08 dongkwan-kim01

merge upstream develop because of pylint error in pytorch

Aug 09 '22 19:08 dongkwan-kim01

Results for the evaluation:

Result for the openvino IR model evaluation(Mask model AC not supported): Model AC Measured Reference FP32(OV) Diff FP32 within Threshold inception_v3_imagenet_rb_sparsity_int8 77.24% 78.03% -0.79% O retinanet_coco_int8 33.16% 33.33% -0.17% O mask_rcnn_coco_int8 - - - -

-> meets within threshold(-1.0%)

@dongkwan-kim01, It looks like very strange. What precision did you use in the test? As you know, TF 2.8 uses TensorFloat-32 by default.

Aug 11 '22 15:08 alexsu52

Results for the evaluation: Result for the openvino IR model evaluation(Mask model AC not supported): Model AC Measured Reference FP32(OV) Diff FP32 within Threshold inception_v3_imagenet_rb_sparsity_int8 77.24% 78.03% -0.79% O retinanet_coco_int8 33.16% 33.33% -0.17% O mask_rcnn_coco_int8 - - - - -> meets within threshold(-1.0%)

@dongkwan-kim01, It looks like very strange. What precision did you use in the test? As you know, TF 2.8 uses TensorFloat-32 by default.

I used same configuration(examples/tensorflow/{task}/configs) and unit test code in tests/tensorflow/test_sota_checkpoints.py. And for the TF32 precision, AFAIK tensorflow adopting the TF32 precision as default before version 2.5, and when I checked in the TF2.5.3(current develop version) and TF2.8.2 both of them using TF32 as default. Can it be a problem in the new TF version?

Aug 12 '22 06:08 dongkwan-kim01

Results for the evaluation: Result for the openvino IR model evaluation(Mask model AC not supported): Model AC Measured Reference FP32(OV) Diff FP32 within Threshold inception_v3_imagenet_rb_sparsity_int8 77.24% 78.03% -0.79% O retinanet_coco_int8 33.16% 33.33% -0.17% O mask_rcnn_coco_int8 - - - - -> meets within threshold(-1.0%)

@dongkwan-kim01, It looks like very strange. What precision did you use in the test? As you know, TF 2.8 uses TensorFloat-32 by default.

I used same configuration(examples/tensorflow/{task}/configs) and unit test code in tests/tensorflow/test_sota_checkpoints.py. And for the TF32 precision, AFAIK tensorflow adopting the TF32 precision as default before version 2.5, and when I checked in the TF2.5.3(current develop version) and TF2.8.2 both of them using TF32 as default. Can it be a problem in the new TF version?

@andrey-churkin Please take a look at accuracy drop issue.

Aug 12 '22 07:08 alexsu52

The test in CI server has failed in one model's Diff Expected(mobilenet_v3 int8 & sparsity 42) painted yellow in the figure, while the diff threshold is -0.1%p. Should we adjust the threshold for this model? @andrey-churkin @alexsu52

Aug 19 '22 05:08 dongkwan-kim01

@dongkwan-kim01 as you can see, the model's expected accuracy value is 67.55, which means that this very same model was specifically measured to reach 67.55 accuracy in the past. If you did not retrain the models in question, and if this is a degradation specific to this PR, then something's wrong with the code in the PR that needs to be investigated.

Aug 19 '22 06:08 vshampor

@dongkwan-kim01 as you can see, the model's expected accuracy value is 67.55, which means that this very same model was specifically measured to reach 67.55 accuracy in the past. If you did not retrain the models in question, and if this is a degradation specific to this PR, then something's wrong with the code in the PR that needs to be investigated.

Yes, the reason why the accuracy is different from the previous tf version should be investigated, though the main logical difference is changing import path, from deprecated tf.python.keras to tf.keras.

Below is the result from TF2.5 test with importing the tf.keras, rather tf.python.keras in our develop code. The performance is same as previously tested accuracy. I think the reason that performance has changed would come from the framework change, not from changing tf.python.keras to tf.keras. I'm still investigating the reason.

Aug 21 '22 15:08 dongkwan-kim01

@vshampor @andrey-churkin While running the TF2.8 weekly training, mobilenet_v3 int8 sparsity 42 test shows 67.61% and 67.62% accuracy which is better than expected result 67.55%, so that we can pass the nightly test.

And the reference fp32 full precision Accuracy in TF2.8: tfrecords 68.39%, tfds 68.38%

Aug 23 '22 08:08 dongkwan-kim01

@vshampor @andrey-churkin While running the TF2.8 weekly training, mobilenet_v3 sparsity quantization test shows 67.61% and 67.62% accuracy which is better than expected result 67.55%, so that we can pass the nightly test.

And the reference fp32 full precision Accuracy in TF2.8: tfrecords 68.39%, tfds 68.38%

Hi @andrey-churkin, could you give an opinion about this result?

Aug 24 '22 07:08 wonjuleee

The improvement looks great, but now, just to be on the safe side, we would like to understand what changed exactly between the situation where you posted the screenshots with accuracy degradation and the current state. Was there a bug fixed somewhere in the code?

Aug 24 '22 08:08 vshampor

I didn't fix the bug in the code, rather retrained the model in tf2.8. Code level changes were from keras layer importing path, so no specific reason why the accuracy has slightly changed.. Is there any suggestion about the accuracy change?

Aug 25 '22 03:08 dongkwan-kim01

@dongkwan-kim01 Which test did you launch exactly in order to get the results above? A pytest launch command would be fine.

Aug 25 '22 10:08 vshampor

@vshampor

pytest --junitxml nncf-tests.xml tests/tensorflow/test_sota_checkpoints.py \
-s \
--sota-checkpoints-dir=/REDACTED/develop/tensorflow \
--sota-data-dir=/REDACTED/tensorflow \
--metrics-dump-path=./metrics_dump \

If you need data for test, please tell me.

Aug 26 '22 01:08 dongkwan-kim01

So in an offline discussion it became clear that the accuracy in 2.8 for the two models could only be retained after retraining in 2.8, which, unfortunately, makes the resulting checkpoints incompatible with 2.5. We will have to keep both versions of the checkpoints in this case if we want to claim support to both 2.5 and 2.8.

Aug 26 '22 07:08 vshampor

CI nightly test passed(test 85) with replacing mobilenet_v3_small int8 sparsity 42 to TF2.8 trained ckpt.

Training for det, seg models are still running. I'll upload it to CI server in the separate path from our develop branch is using.

Aug 31 '22 17:08 dongkwan-kim01

@dongkwan-kim01 Thanks for sharing the results! Good job!

I would like to summarize:

Our current TF2.5 checkpoints were checked on this PR (sota_eval_tensorflow, CI job # 65) The mobilenet_v3_small_int8_w_sym_ch_half_a_asym_t_rb_sparsity_42 was not passed the test. This model was retrained using TF2.8 and the current branch.
Our current TF2.5 checkpoints, where the mobilenet_v3_small_int8_w_sym_ch_half_a_asym_t_rb_sparsity_42 model was changed, were checked again on this PR (sota_eval_tensorflow, CI job # 85). All models passed the test.
We are waiting for the results of the weekly test now.

So, I think if this test is successful we can merge this PR. @dongkwan-kim01 Could you please check this message and comment if something is wrong? Thank you!

Sep 01 '22 09:09 andrey-churkin

@dongkwan-kim01 Thanks for sharing the results! Good job!

I would like to summarize:

Our current TF2.5 checkpoints were checked on this PR (sota_eval_tensorflow, CI job # 65) The mobilenet_v3_small_int8_w_sym_ch_half_a_asym_t_rb_sparsity_42 was not passed the test. This model was retrained using TF2.8 and the current branch.

Our current TF2.5 checkpoints, where the mobilenet_v3_small_int8_w_sym_ch_half_a_asym_t_rb_sparsity_42 model was changed, were checked again on this PR (sota_eval_tensorflow, CI job # 85). All models passed the test.

We are waiting for the results of the weekly test now.

So, I think if this test is successful we can merge this PR. @dongkwan-kim01 Could you please check this message and comment if something is wrong? Thank you!

Thank you for your summarization. All answer for 1, 2, 3 are yes, but since the weekly training was done for classification models, I've tested them and found test failed for all mobilenet related algorithms with accuracy drop, though they trained in TF2.8 environment. I'll make ticket about it, and at this time we can merge this PR, with loosening some accuracy for nightly test.

Sep 02 '22 08:09 dongkwan-kim01

jenkins retest this please

Sep 14 '22 04:09 dongkwan-kim01