tfjs [Perf] The time of Conv2DBackpropInput is very long in BlazePose/hand

hand_detector

Kernel	Time(ms)	Inputs	Output	GPUPrograms
Conv2DBackpropInput	9.73	input0: 4D[1,16,16,256]input1: 4D[2,2,128,256]	1,32,32,128	UnpackProgram: 0.026083, Conv2DDerInputProgram: 9.706294
Conv2DBackpropInput	4.93	input0: 4D[1,8,8,256]input1: 4D[2,2,256,256]	1,16,16,256	UnpackProgram: 0.011333, Conv2DDerInputProgram: 4.91648

BlazePose with inputSize 256, inputType Tensor

Kernel	Time(ms)	Inputs	Output	GPUPrograms
Conv2DBackpropInput	20.20	input0: 4D[1,7,7,1152]input1: 4D[2,2,192,1152]	1,14,14,192	UnpackProgram: 0.042083, Conv2DDerInputProgram: 20.159669

Jun 09 '21 02:06 qjia7

fyi @pyu10055 @lina128 @jinjingforever

Jun 09 '21 02:06 qjia7

This is very informative, thank you @qjia7 ! In the GPUPrograms column, is Conv2DDerInputProgram the packed version and UnpackProgram the unpacked version, both from WebGL backend?

Jun 09 '21 17:06 lina128

@lina128 the UnpackProgram is the shader to unpack outputs from the previous kernel, the Conv2DDerInputProgram is the shader for Conv2DBackpropInput, which one supports unpacked texture.

Jun 09 '21 17:06 pyu10055

FYI, https://github.com/tensorflow/tfjs/pull/6603 use the simulated matmul_vec4 on webgpu can greatly improve the performance of Conv2DBackpropInput.

Jul 11 '22 05:07 qjia7

FYI @Linchenn

Jul 11 '22 16:07 pyu10055

Thank you Jiajia @qjia7! This is a great inspiration. Let me see if we can reuse this algorithm and, at least, we could have a packed version of Conv2DDerInputProgram to improve the performance.

Jul 11 '22 17:07 Linchenn

hand_detector is currently deprecated, while BlazePoseDetector is accelerated with packed Conv2DBackpropInput. However, Conv2DBackpropInput is still the performance bottleneck for BlazePoseDetector.

Feb 04 '23 00:02 Linchenn

Are you satisfied with the resolution of your issue? Yes No

Feb 06 '23 22:02 google-ml-butler[bot]

Hi, @qjia7

Apologize for the delayed response and we're re-visiting our older issues and checking whether those issues got resolved or not as of now so May I know are you still looking for the solution or Have we taken care of this issue please?

I see this PR https://github.com/tensorflow/tfjs/pull/7339 got merged but that PR fixes this issue partially if I'm not wrong so do you want to keep this issue open till it fixes this issue completely ?

If have I missed something here please let me know ? Thank you!

Jun 05 '23 14:06 gaikwadrahul8

Originally, it's fixed by https://github.com/tensorflow/tfjs/pull/7339. But later, another improvement https://github.com/tensorflow/tfjs/pull/7386#issuecomment-1436535284 brings some regressions on Intel devices. I am ok to close this one and file a new one to track that issue.

Jun 06 '23 00:06 qjia7

Hi, @qjia7

Thank you for the confirmation and if you're okay to close this issue from your end then please feel free to close this issue now and submit new issue to track other issue. Thank you!

Jun 09 '23 10:06 gaikwadrahul8

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

Jun 20 '23 01:06 github-actions[bot]

This issue was closed due to lack of activity after being marked stale for past 7 days.

Jun 27 '23 02:06 github-actions[bot]

Are you satisfied with the resolution of your issue? Yes No

Jun 27 '23 02:06 google-ml-butler[bot]

[Perf] The time of Conv2DBackpropInput is very long in BlazePose/hand_detector models in WebGL