Jiajia Qin issues

Results 11 issues of


                                            Jiajia Qin

webgpu: Merge MatMulPackedProgram and MatMulPackedVec4Program

This PR merges MatMulPackedVec4Program to MatMulPackedProgram and refactors MatMulSplitKProgram. To see the logs from the Cloud Build CI, please join either our [discussion](https://groups.google.com/a/tensorflow.org/forum/#!forum/tfjs) or [announcement](https://groups.google.com/a/tensorflow.org/forum/#!forum/tfjs-announce) mailing list. --- This change...

[Perf] The time of Conv2DBackpropInput is very long in BlazePose/hand_detector models in WebGL

type:bug

[Perf] The performance of conv2d is very very poor if the inChannel and outChannel are small and height and width are large in webgl.

Tested using https://honry.github.io/webnn-samples/style_transfer/?backend=webgl Type | Time(ms) | Inputs | Output -- | -- | -- | -- Conv2D | 82.86 | input0: 4D[1,548,548,4]input1: 4D[9,9,4,3] | 1,540,540,3 Conv2D | 63.09 |...

type:bug

webgl: Fix NaN issue

Fix #6822 Problem 1： On some GPUs, even if `a` and `b` are both non-NaN, the value of `isNaN` in `vec4 isNaN = min(vec4(isnan(a)) + vec4(isnan(b)), vec4(1.0));` are still larger...

[WebGL] Optimize the async read

The `async read` in `backend_webgl.ts` always creates a new `PIXEL_PACK_BUFFER` [buffer](https://github.com/tensorflow/tfjs/blob/master/tfjs-backend-webgl/src/backend_webgl.ts#L336). Once the download is finished, delete [it ](https://github.com/tensorflow/tfjs/blob/master/tfjs-backend-webgl/src/backend_webgl.ts#L370) in GPU. So this buffer is created and deleted over and...

type:others

Jiajia Qin

webgpu: Merge MatMulPackedProgram and MatMulPackedVec4Program

[Perf] The time of Conv2DBackpropInput is very long in BlazePose/hand_detector models in WebGL

[Perf] The performance of conv2d is very very poor if the inChannel and outChannel are small and height and width are large in webgl.

webgl: Fix NaN issue

[WebGL] Optimize the async read

[js/webgpu] Optimize grouped conv

[webgpu-native] opt matmulnbits

[webgpu] Make the GQA's intermediate buffer static

[webgpu] Enable graph capture

[webgpu] Support broadcast attention_bias