[Question] Clarification on FP8 Micro-block Scaling and FP4 Support Timeline

Open ultranationalism opened this issue 3 weeks ago • 0 comments

Hi cuTile team,

I have two specific questions regarding the support for Blackwell-specific hardware features:

Automatic Micro-block Scaling for FP8 When using fp8 with ct.matmul, how is the Micro-block Scaling (1x16) handled?

Automation: Does the tileiras compiler automatically handle the scaling logic and hardware invocation (5th-gen Tensor Core) under the hood?

Explicit Scaling: If it is not fully automatic, how should we provide the scale-factor tiles to the ct.matmul operator? Currently, the ct.matmul(A, B) signature seems to only accept data tiles. Is there a plan for a signature like ct.matmul(A, B, A_scale, B_scale)?

NVFP4 (FP4) Support Roadmap The current documentation and samples focus on fp8 and bf16. Since Blackwell's throughput peak is tied to NVFP4: When can we expect the support for 4-bit narrow-precision tiles in cuTile Python?

Thanks for this great library!

Dec 31 '25 07:12 ultranationalism