tfjs icon indicating copy to clipboard operation
tfjs copied to clipboard

[Perf] The time of Conv2DBackpropInput is very long in BlazePose/hand_detector models in WebGL

Open qjia7 opened this issue 4 years ago • 6 comments

hand_detector

Kernel Time(ms) Inputs Output GPUPrograms
Conv2DBackpropInput 9.73 input0: 4D[1,16,16,256]input1: 4D[2,2,128,256] 1,32,32,128 UnpackProgram: 0.026083, Conv2DDerInputProgram: 9.706294
Conv2DBackpropInput 4.93 input0: 4D[1,8,8,256]input1: 4D[2,2,256,256] 1,16,16,256 UnpackProgram: 0.011333, Conv2DDerInputProgram: 4.91648

BlazePose with inputSize 256, inputType Tensor

Kernel Time(ms) Inputs Output GPUPrograms
Conv2DBackpropInput 20.20 input0: 4D[1,7,7,1152]input1: 4D[2,2,192,1152] 1,14,14,192 UnpackProgram: 0.042083, Conv2DDerInputProgram: 20.159669

qjia7 avatar Jun 09 '21 02:06 qjia7

fyi @pyu10055 @lina128 @jinjingforever

qjia7 avatar Jun 09 '21 02:06 qjia7

This is very informative, thank you @qjia7 ! In the GPUPrograms column, is Conv2DDerInputProgram the packed version and UnpackProgram the unpacked version, both from WebGL backend?

lina128 avatar Jun 09 '21 17:06 lina128

@lina128 the UnpackProgram is the shader to unpack outputs from the previous kernel, the Conv2DDerInputProgram is the shader for Conv2DBackpropInput, which one supports unpacked texture.

pyu10055 avatar Jun 09 '21 17:06 pyu10055

FYI, https://github.com/tensorflow/tfjs/pull/6603 use the simulated matmul_vec4 on webgpu can greatly improve the performance of Conv2DBackpropInput.

qjia7 avatar Jul 11 '22 05:07 qjia7

FYI @Linchenn

pyu10055 avatar Jul 11 '22 16:07 pyu10055

Thank you Jiajia @qjia7! This is a great inspiration. Let me see if we can reuse this algorithm and, at least, we could have a packed version of Conv2DDerInputProgram to improve the performance.

Linchenn avatar Jul 11 '22 17:07 Linchenn

hand_detector is currently deprecated, while BlazePoseDetector is accelerated with packed Conv2DBackpropInput. However, Conv2DBackpropInput is still the performance bottleneck for BlazePoseDetector.

Linchenn avatar Feb 04 '23 00:02 Linchenn

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Feb 06 '23 22:02 google-ml-butler[bot]

Hi, @qjia7

Apologize for the delayed response and we're re-visiting our older issues and checking whether those issues got resolved or not as of now so May I know are you still looking for the solution or Have we taken care of this issue please?

I see this PR https://github.com/tensorflow/tfjs/pull/7339 got merged but that PR fixes this issue partially if I'm not wrong so do you want to keep this issue open till it fixes this issue completely ?

If have I missed something here please let me know ? Thank you!

gaikwadrahul8 avatar Jun 05 '23 14:06 gaikwadrahul8

Originally, it's fixed by https://github.com/tensorflow/tfjs/pull/7339. But later, another improvement https://github.com/tensorflow/tfjs/pull/7386#issuecomment-1436535284 brings some regressions on Intel devices. I am ok to close this one and file a new one to track that issue.

qjia7 avatar Jun 06 '23 00:06 qjia7

Hi, @qjia7

Thank you for the confirmation and if you're okay to close this issue from your end then please feel free to close this issue now and submit new issue to track other issue. Thank you!

gaikwadrahul8 avatar Jun 09 '23 10:06 gaikwadrahul8

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions[bot] avatar Jun 20 '23 01:06 github-actions[bot]

This issue was closed due to lack of activity after being marked stale for past 7 days.

github-actions[bot] avatar Jun 27 '23 02:06 github-actions[bot]

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Jun 27 '23 02:06 google-ml-butler[bot]