[FEA] distributed_helper used to allow mem_order="acquire" in spin_lock_wait, but in 4.3.1 pip package only "relaxed" is exposed via

Open aleozlx opened this issue 2 months ago • 0 comments

Which component requires the feature?

CuTe DSL

Feature Request

Is your feature request related to a problem? Please describe.

distributed_helper used to allow mem_order="acquire" in spin_lock_wait, but in 4.3.1 pip package only "relaxed" is exposed via spin_lock_atom_cas_relaxed_wait. wish there is an "acquire" version exposed somehow as well

Describe the solution you'd like one possibility : spin_lock_atom_cas_acquire_wait

Describe alternatives you've considered keep mem_order string arg may be some of our source code is ported from examples? i'm not sure, need to check ...

Additional context used by https://github.com/aleozlx/flashinfer/blob/442dec9bea569f53e01b799a2e0328c2ea30bbca/flashinfer/cute_dsl/gemm_allreduce_two_shot.py#L1399-L1403 https://github.com/NVIDIA/cutlass/blob/v4.3.1/python/CuTeDSL/cutlass/utils/distributed_helpers.py#L136

Dec 05 '25 02:12 aleozlx