Jiajia Qin

Results 11 issues of Jiajia Qin

This PR merges MatMulPackedVec4Program to MatMulPackedProgram and refactors MatMulSplitKProgram. To see the logs from the Cloud Build CI, please join either our [discussion](https://groups.google.com/a/tensorflow.org/forum/#!forum/tfjs) or [announcement](https://groups.google.com/a/tensorflow.org/forum/#!forum/tfjs-announce) mailing list. --- This change...

hand_detector Kernel | Time(ms) | Inputs | Output | GPUPrograms -- | -- | -- | -- | -- Conv2DBackpropInput | 9.73 | input0: 4D[1,16,16,256]input1: 4D[2,2,128,256] | 1,32,32,128 | UnpackProgram:...

type:bug

Tested using https://honry.github.io/webnn-samples/style_transfer/?backend=webgl Type | Time(ms) | Inputs | Output -- | -- | -- | -- Conv2D | 82.86 | input0: 4D[1,548,548,4]input1: 4D[9,9,4,3] | 1,540,540,3 Conv2D | 63.09 |...

type:bug

Fix #6822 Problem 1: On some GPUs, even if `a` and `b` are both non-NaN, the value of `isNaN` in `vec4 isNaN = min(vec4(isnan(a)) + vec4(isnan(b)), vec4(1.0));` are still larger...

The `async read` in `backend_webgl.ts` always creates a new `PIXEL_PACK_BUFFER` [buffer](https://github.com/tensorflow/tfjs/blob/master/tfjs-backend-webgl/src/backend_webgl.ts#L336). Once the download is finished, delete [it ](https://github.com/tensorflow/tfjs/blob/master/tfjs-backend-webgl/src/backend_webgl.ts#L370) in GPU. So this buffer is created and deleted over and...

type:others

### Description #21618 This PR optimizes grouped conv by 1) more sequential memory access in gpu 2) reusing input's data to reduce global memory access times. See `Conv|GroupedConv` op in...

### Description ### Motivation and Context

### Description This PR makes the intermediate generated buffers static in GQA for the static kv cache so that it's possible to use the graph capture capability on llm. The...

ep:WebGPU

This PR enables graph capture capabilities in the WebGPU provider, which is similar with jsep one #18989. All limitations are similar with JS/CUDA EP: 1. Models with control-flow ops (i.e....

ep:WebGPU