Tim
Tim
> It depends what you mean by FP4 training. We do not yet plan to support full FP4 training (as in, both forward and backward) since there is no evidence...
> It depends what you mean by FP4 training. We do not yet plan to support full FP4 training (as in, both forward and backward) since there is no evidence...
> If anyone knows any details about the details of the implementation beyond the code itself, please do share. Thank you for your reply. The current release version uses FP4...
> https://research.colfax-intl.com/cutlass-tutorial-sub-byte-gemm-on-nvidia-blackwell-gpus/ Thank you for your reply. I have carefully read the reference documentation you provided and conducted some experiments. I discovered something interesting: when packing 4 consecutive fp6 e3m2...
> Ok, this is actually a slightly larger can of worms than I anticipated (in addition to the problem you flagged there are random things like CMake