hcman2

Results 6 issues of hcman2

1. Support and enable heuristic GSU. 2. Fix activation function call bug..

noCI
gfx94x

Add a custom kernel for better performance.

Retune with LSU feature.

gfx94x

Fix bug when LSU >1 and AFC = false.

gfx94x

ds_store_b128 takes more cycles then 1 mfma latency. Split b128 into b32's and schedule into different mfma if ppossible.

gfx94x