tensorflow icon indicating copy to clipboard operation
tensorflow copied to clipboard

[oneDNN][tfg][remapper] Contraction + BiasAdd + <Activation> Fusion.

Open mdfaijul opened this issue 3 years ago • 8 comments

This PR enables few fusions (contraction related):

  • Contraction (MatMul, Conv2D, Conv3D, and DepthwiseConv2DNative) + BiasAdd
  • Contraction + BiasAdd + Activation (Relu, Relu6, LeakyRelu, Tanh, and Sigmoid)

Note: We are gradually renaming MKL to oneDNN.

mdfaijul avatar May 17 '22 05:05 mdfaijul

@ChiaHungDuan Thank you very much for the suggestions. We will submit a format PR soon.

mdfaijul avatar May 17 '22 23:05 mdfaijul

@ChiaHungDuan @penpornk This PR is a formal PR. Please help reviewing this.

mdfaijul avatar Jul 05 '22 19:07 mdfaijul

LGTM. Please wait for Roman's approval

ChiaHungDuan avatar Jul 14 '22 21:07 ChiaHungDuan

@mdfaijul Can you please check @joker-eph's comments and keep us posted ? Thank you!

gbaned avatar Jul 20 '22 06:07 gbaned

@ChiaHungDuan @joker-eph Thank you very much for reviewing this PR and the suggestions. I have addressed all. Please check it.

mdfaijul avatar Jul 21 '22 20:07 mdfaijul

@ChiaHungDuan Thanks! again for the reviews. I have addressed those. Please check.

mdfaijul avatar Jul 22 '22 00:07 mdfaijul

@rdzhabarov @ChiaHungDuan Any update on this PR?

mdfaijul avatar Aug 08 '22 17:08 mdfaijul

@rdzhabarov Thanks! for the review. I have addressed the comments. Please check.

mdfaijul avatar Aug 08 '22 21:08 mdfaijul

@rdzhabarov Thanks! for the review. I have addressed the comments. Please check.

mdfaijul avatar Aug 10 '22 16:08 mdfaijul

@penpornk could you help merging this? I do not see to have a permission to "merge pull request".

rdzhabarov avatar Aug 11 '22 21:08 rdzhabarov

Don't merge PRs. They get converted to internal CLs, reviewed there and then merged via Copybara

life_of_pr

mihaimaruseac avatar Aug 11 '22 23:08 mihaimaruseac

@mdfaijul i see some trivial failures, could you fix those?

  • pass.cc:44 ClangTidy: do not use namespace using-directives; use using-declarations instead
  • pass.cc:121 ClangTidy: missing [#include] for 'std::string'
  • remapping_helper.h:63 ClangTidy: missing [#include] for 'std::string'

rdzhabarov avatar Aug 12 '22 19:08 rdzhabarov

@rdzhabarov I have addressed those. However, I am not sure how to run clang-tidy on tensorflow repository. Please check if it passes now.

mdfaijul avatar Aug 12 '22 22:08 mdfaijul

@rdzhabarov Please let us know if the PR caused any more failures or not. Thanks!

mdfaijul avatar Aug 15 '22 15:08 mdfaijul

@mdfaijul these are the failures

 tensorflow/core/transforms/remapper/remapping_helper.h:26 ClangTidy: do not use unnamed namespaces in header files
 tensorflow/core/transforms/remapper/remapping_helper.h:65 ClangTidy: unused function 'GetTfgOpName'

Let's fix these. If there will be yet another failures I'll take a look and will be fixing internally myself.

rdzhabarov avatar Aug 15 '22 16:08 rdzhabarov

@mdfaijul these are the failures

 tensorflow/core/transforms/remapper/remapping_helper.h:26 ClangTidy: do not use unnamed namespaces in header files
 tensorflow/core/transforms/remapper/remapping_helper.h:65 ClangTidy: unused function 'GetTfgOpName'

Let's fix these. If there will be yet another failures I'll take a look and will be fixing internally myself.

Thanks! @rdzhabarov. I will take a look into those. May I know the clang-tidy command being used for those failures. I have produced compile_commands.json and will try running cland-tidy on my side. It would be also nice to know if there is better a way to do clang-tidy on a bazel project like tensorflow.

mdfaijul avatar Aug 15 '22 16:08 mdfaijul

@rdzhabarov Please let us know if we have still internal failures or not. Thanks!

mdfaijul avatar Aug 16 '22 03:08 mdfaijul

Importing changes now.

rdzhabarov avatar Aug 16 '22 17:08 rdzhabarov

@mdfaijul can you check tensorflow/core/transforms/remapper/tests/contraction.mlir?

EDIT: I changed -remapper to -tfg-remapper but it's still failing.

rdzhabarov avatar Aug 16 '22 22:08 rdzhabarov

@rdzhabarov Nice catch. I forgot to update the RUN command. Fixed it. Thanks!

mdfaijul avatar Aug 16 '22 23:08 mdfaijul

@mdfaijul the test was failing when i fixed that. So i expect it to fail with your change.

rdzhabarov avatar Aug 16 '22 23:08 rdzhabarov

@rdzhabarov Is there any command for automated like bazel test?

mdfaijul avatar Aug 16 '22 23:08 mdfaijul

@rdzhabarov I ran this manually bazel-bin/tensorflow/core/transforms/tfg-transforms-opt --tfg-remapper tensorflow/core/transforms/remapper/tests/contraction.mlir | bazel-bin/external/llvm-project/llvm/FileCheck tensorflow/core/transforms/remapper/tests/contraction.mlir

And here is the output 2022-08-16 16:40:55.164940: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.

I do not see an error on my side, I think.

mdfaijul avatar Aug 16 '22 23:08 mdfaijul

@rdzhabarov Is it possible to give me a reproducer command?

mdfaijul avatar Aug 17 '22 00:08 mdfaijul

It looks like the command is:

tfg-transforms-opt --tfg-remapper long_path/contraction.mlir | FileCheck long_path/contraction.mlir

I have not tried to repro yet (i replaced actual path to the file with long_path)

rdzhabarov avatar Aug 17 '22 01:08 rdzhabarov

I was able to repro locally with

blaze test //third_party/tensorflow/core/transforms/remapper/...

(was running internally, I think bazel should work for you)

/tests/contraction.mlir:41:12: error: CHECK: expected string not found in input
 // CHECK: _FusedConv3D(%[[PLACEHOLDER]], %[[FILTER]], %[[BIAS]]) {{.*}} name("BiasAdd") {{.*}} fused_ops = ["BiasAdd"]
           ^
<stdin>:21:66: note: scanning from here
 %Const_1, %ctl_2 = Const device("/device:CPU:0") name("Const_1") {dtype = f32, value = dense<[0.00119970727, 0.199058324]> : tensor<2xf32>} : () -> (tensor<*xf32>)
                                                                 ^
<stdin>:21:66: note: with "PLACEHOLDER" equal to "Placeholder"
 %Const_1, %ctl_2 = Const device("/device:CPU:0") name("Const_1") {dtype = f32, value = dense<[0.00119970727, 0.199058324]> : tensor<2xf32>} : () -> (tensor<*xf32>)
                                                                 ^
<stdin>:21:66: note: with "FILTER" equal to "Const"
 %Const_1, %ctl_2 = Const device("/device:CPU:0") name("Const_1") {dtype = f32, value = dense<[0.00119970727, 0.199058324]> : tensor<2xf32>} : () -> (tensor<*xf32>)
                                                                 ^
<stdin>:21:66: note: with "BIAS" equal to "Const_1"
 %Const_1, %ctl_2 = Const device("/device:CPU:0") name("Const_1") {dtype = f32, value = dense<[0.00119970727, 0.199058324]> : tensor<2xf32>} : () -> (tensor<*xf32>)
                                                                 ^
contraction.mlir:61:12: error: CHECK: expected string not found in input
 // CHECK: _FusedDepthwiseConv2dNative(%[[PLACEHOLDER]], %[[FILTER]], %[[BIAS]]) {{.*}} name("BiasAdd") {{.*}} fused_ops = ["BiasAdd"]
           ^
<stdin>:34:66: note: scanning from here
 %Const_6, %ctl_7 = Const device("/device:CPU:0") name("Const_1") {dtype = f32, value = dense<[-0.777153254, -0.696344554, 0.0879275202, -1.49649549]> : tensor<4xf32>} : () -> (tensor<*xf32>)
                                                                 ^
<stdin>:34:66: note: with "PLACEHOLDER" equal to "Placeholder"
 %Const_6, %ctl_7 = Const device("/device:CPU:0") name("Const_1") {dtype = f32, value = dense<[-0.777153254, -0.696344554, 0.0879275202, -1.49649549]> : tensor<4xf32>} : () -> (tensor<*xf32>)

rdzhabarov avatar Aug 17 '22 01:08 rdzhabarov

@rdzhabarov Thank you very much for the error logs. This is very helpful. I could reproduce those errors with environment variable TF_ENABLE_ONEDNN_OPTS=0. I had an impression that oneDNN has been enabled by default. But I realized that it is default only on cascade-lake and onwards. So I separated out onednn based patterns test (tensorflow/core/transforms/remapper/tests/onednn_contraction.mlir). Now we have added 2 tests

bazel-bin/tensorflow/core/transforms/tfg-transforms-opt --tfg-remapper=enable-onednn-patterns tensorflow/core/transforms/remapper/tests/onednn_contraction.mlir | bazel-bin/external/llvm-project/llvm/FileCheck tensorflow/core/transforms/remapper/tests/onednn_contraction.mlir

bazel-bin/tensorflow/core/transforms/tfg-transforms-opt --tfg-remapper tensorflow/core/transforms/remapper/tests/contraction.mlir | bazel-bin/external/llvm-project/llvm/FileCheck tensorflow/core/transforms/remapper/tests/contraction.mlir

Please let me know if it is working now.

mdfaijul avatar Aug 17 '22 15:08 mdfaijul

Thank you! I'm importing/testing.

rdzhabarov avatar Aug 17 '22 17:08 rdzhabarov