[SYCL2020] Implement sycl::atomic_fence
Hi Aksel,
Less than a year later, I'm trying to bring it full circle here. 🥳 This (properly) closes #767.
Let me know what you think so far. I haven't implemented the SSCP path because I could only find existing implementations for barriers but not memory fences, and so I think I should wait for you to tell me where and how you want this. 🙂
Cheers, -N
Can we implement the old
mem_fenceon top of the newer one?
Done.
For SSCP I would suggest looking at barrier - every barrier is effectively a combination of a memfence and a blocking synchronization primitive. So probably we can factor the
mem_fenceout from there :)
Done.
~Can I get the declaration of __spirv_MemoryBarrier from spirv_ops.hpp in fence.cpp or better not?~ I did not, just in case we want a clear separation.
~As written, the semantics for seq_cst + device for explicit multipass SPIR-V and SSCP SPIR-V is different as the memory semantics flags differ. I suspect the former includes MemorySemanticsMask::WorkgroupMemory because mem_fence could (and still can) use fence_space::global_and_local, but we were using it even with fence_space::global_space. In the barrier implementation we do not include MemorySemanticsMask::WorkgroupMemory. What should we do here?~
I've completely rewritten my implementation so it reflects my understanding of the intended semantics.
There are now no differences for SPIR-V.
For HIP-like devices, the SSCP path might be more precise, in the sense that the IR fence primitives we end up calling expose more choices than just __threadfence_block() and __threadfence().
It may well be those IR primitives only implement sequential consistency at the workgroup and device scopes as well - I dunno - but we don't care about that... Indeed, if that were to change, SSCP compilations would automatically be taking advantage.