mmtk-core icon indicating copy to clipboard operation
mmtk-core copied to clipboard

Refactoring the write barrier API

Open wks opened this issue 1 year ago • 3 comments

TL;DR: This issue addresses some recent discussion about write barriers and omitting the slot and target parameters of the write barrier API functions.

We have some previous discussion about generalizing the subsuming barrier API for tagged references and atomic RMW (including CAS). This issue does not discuss subsuming barrier in depth, except acknowledging that the most general form of subsuming barrier sucks.

The most general write barrier API sucks!

The write barrier that is ultimately general, w.r.t. the field representation (pointer, compressed pointer, offsetted pointer, tagged pointer, handle, etc.), the operation (store, compare-and-swap, atomic exchange, etc.), whether the object is multi-copy (like Sapphire where write barriers write to both the old copy and the new copy), whether non-reference fields need barriers (like Sapphire), and the kind of barrier (object-logging barrier, field-logging barrier, SATB barrier, XOR zone barrier, generational barrier, etc.), is a subsuming barrier that lets the VM binding implement the actual write operation, has multiple object fields, an optional slot field, an optional and it can be very complicated.

fn object_reference_write(mutator: Mutator,
    /// The object
    object: ObjectReference,
    /// For GC like Sapphire, the "new copy" or "old copy" of the current object
    mirrored_object: Option<ObjectReference>,
    /// Only used by field-logging barriers (LXR).  Other barriers just let `operation` do the actual write.
    slot_addr: Option<Address>,
    /// The old target in the slot, or None if the slot was holding NULL, None, nil, false, true, nothing, missing, undef, small integer, etc.
    old_target: Option<ObjectReference>,
    /// The target object, or None if storing NULL, None, nil, false, true, nothing, missing, undef, small integer, etc.
    new_target: Option<ObjectReference>,
    /// A routine provided by VM binding to do the actual write/swap/CAS.
    /// Return the actual old target if different from the `old_target` argument.
    /// This can happen in CAS.
    operation: FnOnce() -> Option<ObjectReference>);

(p.s. Ask @wks for an example of SATB barrier for AtomicReference.compareAndExchangeAcquire in OpenJDK, or figure it out by yourself)

An API like this should be able to handle plans like GenImmix (ObjectBarrier), LXR (FieldBarrier), CMS (SatbBarrier), Sapphire (multi-write barrier), G1 (XOR zone barrier), @wenyuzhao's hypothetical alternative generational barrier design (gerational barrier), etc. and handle VMs like OpenJDK (needs atomic swap and CAS), CRuby (needs tagged reference), V8 (needs tagged reference and multiple flavors of NULL values), etc.

But an API like this will surely scare away 9 out of 10 PhD students or even professors in the field of language/VM implementation, not to mention developers who has "absolutely no idea how to write a programming language".

What's worse, if a VM wants to be fully general, it will need to apply such a subsuming barrier for every non-reference field write, too, just in case the current plan is Sapphire. But that'll slow down all programs, perhaps too slow even for debug builds.

What should we do?

Be practical. Provide a few flavors of pre-post barriers.

MMTk currently only has the ObjectBarrier in the master branch, and it has the field-logging barrier in the lxr branch. Considering common SATB barriers, advancing/retreating barriers for concurrent MS, I think a few kinds of barrier API functions will be sufficient to cover all barriers we currently have, and should be general enough for additional kinds of barriers.

Barrier forms

fn object_reference_write_pre_o(mutator: Mutator, object: ObjectReference);
fn object_reference_write_post_o(mutator: Mutator, object: ObjectReference);
fn object_reference_write_pre_ot(mutator: Mutator, object: ObjectReference, old_target: Option<ObjectReference>);
fn object_reference_write_post_ot(mutator: Mutator, object: ObjectReference, new_target: Option<ObjectReference>);
fn object_reference_write_pre_os(mutator: Mutator, object: ObjectReference, slot_addr: Address);
fn object_reference_write_post_os(mutator: Mutator, object: ObjectReference, slot_addr: Address);
fn object_reference_write_pre_ost(mutator: Mutator, object: ObjectReference, slot_addr: Address, old_target: Option<ObjectReference>);
fn object_reference_write_post_ost(mutator: Mutator, object: ObjectReference, slot_addr: Address, new_target: Option<ObjectReference>);

The suffix o, s and t means object, field_addr and target, respectively. The pre barriers only take old target, while post barriers only take new targets. Target can be None if it is a NULL, None, nil, nothing, missing, undef, true, false, small integer, symbol, etc.

  • The o form can support ObjectBarrier. It only needs to log the object.
  • The os form can support field-logging barrier. It needs the address of the field in order to access side metadata. Note that it is not the Slot type which is intended for updating an object graph edge using the Slot::store method, and a Slot may not necessarily be inside the MMTk heap (can be on the stack or in malloc memory).
  • The ot form can support barriers that need to access the target, such as the SATB barrier (enqueues the old target), the Dijistra/Steel-style grey mutator barriers (inspects/changes the color of the new target), the XOR zone barrier (compute old_target XOR new_target), etc.
  • The ost form is the most general form.

Simplify the API by merging the forms

We can merge those into just two functions:

fn object_reference_write_pre_ost(mutator: Mutator, object: ObjectReference, slot_addr: Option<Address>, old_target: Option<ObjectReference>);
fn object_reference_write_post_ost(mutator: Mutator, object: ObjectReference, slot_addr: Option<Address>, new_target: Option<ObjectReference>);

That is, we use Option<T> to make both the slot_addr and the {old,new}_target optional. We explicitly write into the documentation that if a VM is not able to provide either of those fields, or if the VM knows the barrier (such as ObjectBarrier) doesn't need the slot_addr or the new_target, it can just pass None to them.

This is actually very similar to what we currently have. We currently have a non-optional slot: Slot parameter, and that's probably the only thing that needs to be changed.

We can also create a InteriorPointer type to make it even clearer by making slot_addr an Option<InteriorPointer>. It emphasizes that if it is Some(iptr), the iptr must be in MMTk heap (i.e. not in malloc heap, not on the stack, and not NULL).

Barrier form and Barrier semantics

When using a barrier semantics that needs less information, the VM can invoke a form that provides more information, and it still works.

  • For example, when using the ObjectBarrier, the VM can actually call obj_reference_write_post_ost(m, obj, slot, target), and the ObjectBarrier simply ignores slot and the target.

When using a barrier semantics that needs more information, but the VM is only able to provide less information, it may or may not work, depending on GC algorithms.

  • For example, when using field-logging barrier, but the VM can only identify the object that is changed because the field is accessed by C extensions or the field is off-heap in malloc memory (It happens in CRuby), it will not be able to log the fields.
    • But if we are implementing LXR or other coalescing RC, we can fall back to object-remembering if we can't do field-remembering for a particular object (or field). It will end up remembering more fields, but it still works.
  • For SATB barrier, if the VM only tells MMTk an object is modified, it can conservatively enqueue all children of object. It is slower when executed, but it is still correct, and it won't even keep more objects alive than the "snapshot in the beginning".

What about...

What about atomic swap and CAS?

This is a bit complicated. Due to concurrent access, we only know the actual "old value" after we do the swap or CAS. So if we do this:

let old_target = field.load();
let old_target2 = field.swap(new_target);

Then old_target may be different from old_target2.

It remains a question whether the write barrier actually needs the precise old_target2 at all. For SATB barrier, it doesn't because we only need to record the snapshot at the beginning. So only the oldest target matters. But if we do naive RC (it sucks anyway), we'll need the precise old_target2 to do the decrement.

What about Sapphire?

Those forms don't cover the need to apply barriers for non-reference fields, or the need to write to two copies of the same object. It's not on our agenda.

What about other subsuming barriers?

They are discussed in https://github.com/mmtk/mmtk-core/issues/1038. The main idea is, making a general API for multiple operations (store, swap, CAS, with acquire/release/seqcst orders), multiple field layout (fat, tagged, ofsetted, compressed, handles, etc.) and non-reference values (tagged integers, true, false, multiple NULL flavors like nil, nothing, missing, undef, etc.) can be very complicated.

wks avatar Dec 04 '24 09:12 wks

We discussed this on Monday. We think that there is an advantage of a subsuming API call. The VM binding only needs to call that API function, and the function signature will automatically contain all the information the VM binding needs to supply, such as the source object, the field, the new target, the old target, and, in the case of tagged or offsetted references, letting the VM binding specify the actual word to be written into the field, or specify a call back function that actually writes into the field. Either way is OK.

In comparison, providing multiple API functions (pre, post barriers, etc) will require MMTk core to document the expected use pattern in English, which is less precise than the signature of a subsuming API function, and require the VM binding developers to be disciplined because the programming language cannot help the developer identify abuse.

Rust should be able to inline the callback function. But if the VM binding needs high performance, the JIT compiler should still inline the fast path.

wks avatar Jun 27 '25 04:06 wks

Well, I suggest we should add "old"/"previous" object to the barrier interface. In a SATB barrier for example, fast-path needs to load the object to check if slow-path is needed but in our current implementation, slow-path needs to load the object from the slot again. (This also involves some extra work in openjdk, slot address might need to be patched manually)

tianleq avatar Jul 02 '25 01:07 tianleq

More parameters in the write barrier API functions means higher complexity for JIT compilers. That will require more software engineering efforts, and may have a run-time time and/or space overhead.

Having the src, slot and new_val arguments means that the VM binding (in theory) needs to pass them. It is harmless to the callee if it does not use some of the parameters. But for the caller, the JIT compiler will need to generate the code sequence for preparing those arguments even if they are unused. That means more code for VM binding developers to write, and more instructions emitted at JIT time.

This may become challenging if class loading is also involved. For example, in OpenJDK, the C1 compiler can JIT-compile a field-writing instruction even when the field is not yet resolved. The JIT compiler will leave the field offset as 0, and insert a code-patching stub so that when the code is first executed, it will trap to the runtime, resolve the field, and patch the field-accessing instruction with the correct field offset. The base class method BarrierSetC1::store_at will emit the code of the actual store, and also emit the code-patching stub if needed. But if a pre-barrier needs to load from the field in the fast path, then the method that emits the pre-barrier for C1 will have to generate the code-patching stub because it loads the field before BarrierSetC1::store_at handles the actual field-writing and inserts the code-patching stub.

If we know that object-remembering barriers in MMTk, including the ObjectBarrier and the SATBBarrier, never use the slot or the new_val parameters, we can greatly simplify the implementation of C1 pre-barriers by not preparing those arguments or inserting code-patching stubs. In https://github.com/mmtk/mmtk-openjdk/pull/332, I did a hack by passing slot and target as nullptr. Knowing that MMTk core will never touch those arguments, the call will work properly for now. But it is still a hack. If we introduce field-logging barriers, we will still need the slot argument (or the old value as argument, depending on algorithm). If MMTk provides write barrier APIs that don't have those unused parameters, such as object_reference_write_pre_o(mutator, obj) and object_reference_write_post_o(mutator, obj) I mentioned above, we will be able to formally omit those arguments for object-logging barriers.

wks avatar Sep 22 '25 16:09 wks