change softmax_lse correction of CP to FP32

Open xrennvidia opened this issue 11 months ago • 1 comments

Description

softmax_lse correction is in FP64 now, we can lower it to FP32.
use log1p to be consistent with PR1401.

Type of change

[ ] Documentation change (change only to the documentation, either a fix or a new content)
[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] Infra/Build change
[x] Code refactoring

Checklist:

[x] I have read and followed the contributing guidelines
[x] The functionality is complete
[x] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation
[x] My changes generate no new warnings
[ ] I have added tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass locally with my changes

Mar 07 '25 03:03 xrennvidia

/te-ci pytorch L1

Mar 07 '25 21:03 xrennvidia

/te-ci pytorch L1

Apr 28 '25 20:04 xrennvidia

/te-ci pytorch L1

Apr 29 '25 02:04 xrennvidia

CI failures are not related to this PR.

log1p(x) provides more accuracy when x is close to 0.

Downcasting softmax_lse from double to float because cuDNN softmax_lse is in FP32 and combination of multiple copies of them over CP ranks should still stay in the float limits (see here).

Apr 29 '25 22:04 cyanguwa