Hotfix to add some requested tunings
Adds some requested tunings. This includes changes from:
https://github.com/ROCmSoftwarePlatform/rocBLAS/commit/bc5bd916f343428dacac4b7a9bab0be2a465ac8d
Also a clarifying question: this tuning was done on 104CU and then you copied the result back into the main aldebaran files?
I'm not sure what you mean by this, I used separate devices for tuning and testing of results. A 104CU node for the 104CU libraries and a 110CU node for the main libraries.
In the case where there was no existing 104CU library, I copied the corresponding main library and updated the arch field, and the added new entries on top of that.
@nielenventer if you add commits they won't be covered by the already started testing so you won't be able to merge the PR until a retest is done. Thus a separate PR would have been better if you thought the original commits need to be merged ASAP.
Ok I removed the fp16AltImpl versions from the PR after confirming with all parties it wasn't required for their use case. I'll add them via the normal channels.
I'll add them via the normal channels.
It would probably be easiest to do this in the same PR that pulls the changes back into develop after this PR is merged.
The CI failures look like the normals ones to me. After the force push, the commit in the PR once again matches the one used for the CQE tests, which are a Conditional Go.
I'm OK with merging so long as @TorreZuk and/or @amcamd are on board as well.
@amcamd can you approve too? The extra commit that was not tested was backed out so IMO is fine to merge. The manifest PR Lingamurty would still have to make to accept the commit after merge.