rocking
rocking
I also have a similar issue. In, Tensorflow 1.5, very low GPU util and run slower than CPU.  However, in Tensorflow 1.4. The GPU...
Elementwise and maxpool backward kernel suffer from this issue. As discussed with @qianfengz , this might need to modify StreamConfig
https://ontrack-internal.amd.com/browse/LWPCK-190
@ppanchad-amd As mention in the https://ontrack-internal.amd.com/browse/LWPCK-190 We can close this ticket
I just submit an PR to support AMD / ROCm on FlashAttention 2 https://github.com/Dao-AILab/flash-attention/pull/1010 This PR using [composable_kernel](https://github.com/ROCm/composable_kernel) as backend
@wsippel Yes, The new PR only works for MI200 and MI300 for now.
> I have mi100s, would love to be able to use them We found MI100 may fail in some of the bf16 test cases. Hence, MI100 is not officially support...
> I would like to look into this bf16 issue. Is the cause well understood or in need of research? We focus on MI300 improvement recently, but MI100 is still...
> I would like to concur with ehartford. I'm trying to get the AMD folks to provide more info on the cause of a page fault during the tests which...
> I have 24 mi100s, I would much want to add support for mi100s, Is there anything I can do to help? @ehartford You should ask your AMD sales to...