Dat3M Understanding C-Litmus conditional events

I took a look into the special events we generate for C-Litmus (Linux) tests as they do not compile to our internal LISA-like core language. These special events consist of a few types of ReadCond events that, unlike their name, always perform a read but they also have a conditional defined on top of the read value (i.e., it depends on the resultRegister). Then there are RMWStoreCond and FenceCond that both are actually conditional on the value read by ReadCond.

Let's look at RMWStoreCond first. This one performs a store operation if the conditional associated with its RMWReadCond is satisfied. This is effectively nothing but a conditional in between the Read and the Store: RMWReadCond;RMWStoreCond <=> RMWRead;CondJump(Label);RMWStore;Label And indeed, this event is only generated in the compilation of atomic operations like CAS or add_unless.

Question: What is the reason that we do not insert branching there instead of using these special events? Is it because of added ctrl dependencies? Under Linux, those control dependencies do not propagate past the merging point so it would at most add a single ctrl edge between the read and the store. And if those do make problems, can we add special branching instructions that do not cause ctrl-edges (or simply a tag for existing branching instructions). In a discussion about supporting IMM, we also found the need for such "invisible" branching to properly model CAS operations on the language level.

Now let's look at FenceCond. Its working is basically the same as that of RMWStoreCond in that it executes if and only if the associated read satisfies some condition. Again, a FenceCond that is placed after a ReadCond may as well be modelled by regular branching operations I think. However, we frequently generate this strange code during compilation:

FenceCond(condRead)
condRead
condStore
local
FenceCond(condRead)

where there is a conditional fence before the load it depends on. That means this barrier only executes if the subsequent read satisfies some condition. This obviously cannot be modelled by adding simple branching instructions.

Question: How does it make sense to place a fence before a read but only execute it if that read satisfies something? Does Linux really have such kind of barriers? Is it possible to make this Fence unconditional or move it past the read so that is can be modelled via branching?

Jan 11 '22 17:01 ThomasHaas

Don't forget the ctrl edge between the RMWReadCond event and its connected RMWWriteCond, as well as the dependency to the compared value. It might be required to work with Assume and NonDet, instead, since those constructions break dependency chains. I once tried the naive way and broke several tests.

Jan 14 '22 19:01 xeren

I think I addressed the issue of ctrl-edges. But as I said, under Linux this would be a single edge between the read and the write, which I expect to not be reorderable due to atomicity anyways. Similarly, I wouldn't expect the load that generated the value that is compared against to be reorderable to after the atomic operation either. But I guess here you can have an arbitrary number of additional ctrl-dependency edges. To me it seems that these events are an artifact of the C litmus tests. Also Assume and NonDet are non-existent in practice (and also in litmus tests in particular), so those can be safely ignored.

Interestingly, kernel.org says

atomic_cmpxchg must provide explicit memory barriers around the operation, although if the comparison fails then no memory ordering guarantees are required.

This is why we have the conditional fence before the read operation. But I'm still wondering if it is enough to only surround the write with fences, so that in the fail case there are no fences and in the success case only the read is performed before the fence. Because if this is not the case, then how do operational models of linux work to begin with, since they will have a hard time retroactively fitting a fence into the picture? In case this is not possible, we will have to extend our core language to support these types of events. For example, it should be enough to equate the exec-variable of the fence with the exec-variable of the store so that either both or none are executed.

Jan 14 '22 23:01 ThomasHaas

I tested changing the compilation of CAS from

FenceCond(condRead)
condRead
condStore
local
FenceCond(condRead)

to

rmwRead
condJump(label)
Fence // This one is actually not needed
rmwStore
Fence
label
local

and as expected, the tests run through just fine. So at least for our current set of litmus tests, these special linux events are unnecessary.

Jan 15 '22 14:01 ThomasHaas

Judging from this ("read read" should be "read reg" I think) we don't want to have ctrl dependency between those events.

Unfortunately the LKMM paper does not discuss CAS or add_unless.

I think it would be helpful to have #138 to have more cases to tests if these special linux events are necessary.

Jan 16 '22 07:01 hernanponcedeleon

We do not want control-dependencies because LKMM is a language level WMM and so it should be interpreted over uncompiled code I believe (and that code has no conditionals there). It is similar to why we need cas-dep for IMM. But the point is that those additional ctrl-deps do not make any problems because they only order what is already ordered. And if they do, we should have a way to mark CondJump so that it won't produce ctrl-deps while still giving the conditional semantics. The only real problem is FenceCond which cannot be simulated with CondJumps and I don't know how to go about that one. But I also think that this event is very odd, possibly incorrect.

Jan 16 '22 12:01 ThomasHaas

Using the LKMM documentation referenced in #211, I think we can get rid of those somewhat magical FenceCond events. The reason is based on the following excerpt:

===== From ordering.txt =====
Note that conditional RMW atomic operations such
as cmpxchg() are only guaranteed to provide ordering when they succeed. 
[...] Non-value-returning RMW atomic operations (that is, those
with void return types) do not guarantee any ordering whatsoever.  Nor do
value-returning RMW atomic operations whose names end in _relaxed.
===== From cheatsheet.txt =====
A relaxed operation is either READ_ONCE(), WRITE_ONCE(),
		  a *_relaxed() RMW operation, an unsuccessful RMW
		  operation, a non-value-returning RMW operation such
		  as atomic_inc(), or one of the atomic*_read() and
		  atomic*_set() family of operations.

It says that a failing CAS is a relaxed operation and relaxed operations do not cause ordering, meaning the load we generate should always be a relaxed load. Furthermore, only on success there should be induced ordering. I think this can be achieved by inserting fences in the success branch. The compilation for CAS would then be

rmwload[relaxed]
condJump(Label) // Here we need an invisible branching that does not add ctrl-deps. This is easily achieved with a custom tag
Fence[...] // Possibly with appropriate tags (e.g. acq, if the load was supposed to be acq)
rmwstore
Fence[...]
Label

The only problem I can see is that on a successful CAS, some operations may be moved inbetween the load and the store. But I'm not sure if this can cause any actual problems, since the important ordering between the operations before the CAS and the operation after the CAS is achieved, as well as the ordering of operations before the CAS with the actual effect (the store) of the CAS.

Jan 18 '22 17:01 ThomasHaas

I think any operation that is not intended to be placed in between the load and the store will be ordered by ppo so probably the compilation scheme you suggests is safe

Jan 18 '22 18:01 hernanponcedeleon