y-sq
y-sq
Summary: The issue: When using float8 training with FSDP, we have these tensors in the forward_backward graph: - Without fp8-all-gather: original_weight (all-gather output, sharded) - fp8_weight - fp8_weight_transpose (needed in...
Summary: The diff modifies the `padding` option and added tests with `compile`: * For the scaled_mm of shape MxKxN, the current `inner_padding` option only pads the `K` dimension. However, if...
**Summary**: * Added a config option `defer_reduction_split`. When it's enabled, if `num_splits` gets a `>1` result, return `ReductionHint.DEFERRED_SPLIT, 1` instead. * In scheduler, when fusing nodes, if a node is...