Henry Ho

Results 4 issues of Henry Ho

1. VectorWidthA and VectorWidthB for mfma kernel 2. Wider local read for tileMajorLDS 3. Solve bank conflict caused by VectorWidthA/B 4. Prefetch all localreads for BF16/FP16/INT8 packing to the front...

NoCI

update F8BS TN gridbased for gfx942

gfx94x

remove data initialization dependency of lda in hipblaslt-bench, so that we benchmark same data when leading dimension is different.

gfx94x