[QST] Can hopper_int4_fp8_gemm support Scale with zero-point mode?
Hi,
I tested this file examples/55_hopper_mixed_dtype_gemm/55_hopper_int4_fp8_gemm.cu,
and it shows about a 10%-20% performance improvement under some input sizes.
Is there a plan for supporting Scale with zero-point mode?
Thank you.
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
@ZZBoom Is this still a block for you? I think in principle the lookup table approach should work for zero-points, since we can just encode zero-points in the lookup table. However, currently the lookup table only encode 8 fp8 values, while int4 can represent 16 values. The way to get around this is to only encode negative results, and use XOR to generate the positive results in flight.
Should zero-points be encoded, then this XOR trick wouldn't work. Therefore, a lookup table encoding 16 values (i.e, 128bit) must be used. However, I think current cutlass code base doesn't allow the tma descriptor to load 128bit values so there may need some nontrivial modification to the pipeline. For instance, still use 64bit lookup table but double its size and add an extra dimension to its layout.
Thank you for your reply. I have fully understood the optimization approach of fp8 scale; "Is this still a block for you? " Yes, I am in great need of this part. Because from llm training to production, asymmetric quantization is very friendly in terms of training efficiency and model accuracy compared to symmetric quantization. Therefore, an asymmetric high-performance implementation will be very needed in inference.
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.