cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

Fix MMA promotion interval assertions

Open LyricZhao opened this issue 1 year ago • 2 comments

For BLOCK_SIZE_K=256, GmmaFP8Accumulation has accum_promotion_interval=4 but mma_count_per_mainloop_iteration=8, which makes a non-FP8-fast-accum kernel never promote to FP32 accumulators. This PR fixes the wrong assertion by changing 4 into the real number of MMA instructions issued.

LyricZhao avatar Jul 17 '24 06:07 LyricZhao

Anyone replies to this? I do think it's a serious bug, making BLOCK_SIZE_K=256 made FP8 training loss curve much worse than non-FP8-fast-accum.

LyricZhao avatar Aug 09 '24 09:08 LyricZhao

@IonThruster

thakkarV avatar Aug 09 '24 12:08 thakkarV

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Sep 08 '24 13:09 github-actions[bot]

@yzhaiustc can we please put it on the list for 3.6?

thakkarV avatar Sep 08 '24 15:09 thakkarV

@yzhaiustc can we please put it on the list for 3.6?

sure.

yzhaiustc avatar Sep 09 '24 16:09 yzhaiustc

@manishucsd

hwu36 avatar Sep 16 '24 16:09 hwu36