make grouplinear accept the fp8 input

Open Autumn1998 opened this issue 10 months ago • 1 comments

Description

Make the Grouped linear accept the blockwise fp8 input

Rely on https://github.com/NVIDIA/TransformerEngine/pull/1707 for the compact scaling factors.

TODO:

[ ] Documentation change (change only to the documentation, either a fix or a new content)
[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] Infra/Build change
[ ] Code refactoring

Please list the changes introduced in this PR:

Apr 21 '25 11:04 Autumn1998

Please take a look this https://github.com/NVIDIA/TransformerEngine/pull/1707

FP8 gather for dense model + sequence parallel should be supported with the new need_compact usage API. Suggestions are highly welcomed!

Apr 26 '25 01:04 zhongbozhu