Results 24 comments of rocking

I also have a similar issue. In, Tensorflow 1.5, very low GPU util and run slower than CPU. ![screenshot from 2018-03-22 10 46 38](https://user-images.githubusercontent.com/9115697/37748532-238d1e64-2dbf-11e8-9a4e-e444708daeae.png) However, in Tensorflow 1.4. The GPU...

Elementwise and maxpool backward kernel suffer from this issue. As discussed with @qianfengz , this might need to modify StreamConfig

https://ontrack-internal.amd.com/browse/LWPCK-190

@ppanchad-amd As mention in the https://ontrack-internal.amd.com/browse/LWPCK-190 We can close this ticket

I just submit an PR to support AMD / ROCm on FlashAttention 2 https://github.com/Dao-AILab/flash-attention/pull/1010 This PR using [composable_kernel](https://github.com/ROCm/composable_kernel) as backend

@wsippel Yes, The new PR only works for MI200 and MI300 for now.

> I have mi100s, would love to be able to use them We found MI100 may fail in some of the bf16 test cases. Hence, MI100 is not officially support...

> I would like to look into this bf16 issue. Is the cause well understood or in need of research? We focus on MI300 improvement recently, but MI100 is still...

> I would like to concur with ehartford. I'm trying to get the AMD folks to provide more info on the cause of a page fault during the tests which...

> I have 24 mi100s, I would much want to add support for mi100s, Is there anything I can do to help? @ehartford You should ask your AMD sales to...