mmtk-core Number of (non-precise) stress GCs is sensitive to number of mutator threads

It seems like for the same stress factor, if we change the number of mutator threads, the number of GCs occurring due to non-precise stress (i.e. MMTK_PRECISE_STRESS=false) is wildly different. It may either be an issue with how non-precise stress works or is a bug in stress factor handling that only manifests with non-precise stress.

Mar 14 '22 06:03 k-sareen

Non precise stress test counts the thread local buffer as allocated (as it only increments the allocation bytes counter and checks against the stress factor in the slowpath). So the timing of stress GC depends on the thread local buffer size, the number of mutator threads, and the stress factor (compared to the thread local buffer size). In the comment, it states that non-precise stress test should be used with a large stress factor.

So it is expected that the number of mutator threads will change the number of stress GCs. However, if you notice a bug, please let me know.

Mar 14 '22 07:03 qinsoon

Stress factors used were >> TLAB size. They were in the order of MBs, whereas the TLAB is around 32KB, from memory

Mar 14 '22 07:03 k-sareen

If you are working on this, that's fine, just keep us posted. If you are not working on this, can you give more details on what you have observed for the issue and how to reproduce the issue?

Mar 14 '22 23:03 qinsoon

Sorry -- I'm currently not working on it as I'm working on a paper. I also note a similar issue wherein ImmixAllocator on precise stress does not have the same number of GCs as a GC using BumpAllocator (SemiSpace, GenCopy, MarkCompact). I believe it is related to some implementation bug in ImmixAllocator. I might make a separate issue for this though.

To reproduce the non-precise stress GC issue, run:

MMTK_PLAN=SemiSpace MMTK_PRECISE_STRESS=false MMTK_STRESS_FACTOR=10485760 ./build/linux-x86_64-normal-server-release/jdk/bin/java -XX:MetaspaceSize=500M -XX:+DisableExplicitGC -server -XX:-TieredCompilation -Xcomp -XX:+UseThirdPartyHeap -Dprobes=RustMMTk -Djava.library.path=/home/kunals/git/evaluation/probes -Xms4192M -Xmx4192M -cp /usr/share/benchmarks/dacapo/dacapo-evaluation-git-29a657f.jar:/home/kunals/git/evaluation/probes:/home/kunals/git/evalutation/probes/probes.jar Harness -c probe.DacapoChopinCallback -n 2 lusearch

and

MMTK_PLAN=SemiSpace MMTK_PRECISE_STRESS=false MMTK_STRESS_FACTOR=10485760 taskset -c 0-7 ./build/linux-x86_64-normal-server-release/jdk/bin/java -XX:MetaspaceSize=500M -XX:+DisableExplicitGC -server -XX:-TieredCompilation -Xcomp -XX:+UseThirdPartyHeap -Dprobes=RustMMTk -Djava.library.path=/home/kunals/git/evaluation/probes -Xms4192M -Xmx4192M -cp /usr/share/benchmarks/dacapo/dacapo-evaluation-git-29a657f.jar:/home/kunals/git/evaluation/probes:/home/kunals/git/evalutation/probes/probes.jar Harness -c probe.DacapoChopinCallback -n 2 lusearch

You'll notice a different number of GCs for both the runs (with the trend being more threads => less GCs). Ideally, the number of stress GCs is independent of the number of mutator threads since it should just be based on the number of bytes allocated.

I can look into this in more detail after the paper deadline (Sunday 20th).

Mar 15 '22 02:03 k-sareen