Number of (non-precise) stress GCs is sensitive to number of mutator threads
It seems like for the same stress factor, if we change the number of mutator threads, the number of GCs occurring due to non-precise stress (i.e. MMTK_PRECISE_STRESS=false) is wildly different. It may either be an issue with how non-precise stress works or is a bug in stress factor handling that only manifests with non-precise stress.
Non precise stress test counts the thread local buffer as allocated (as it only increments the allocation bytes counter and checks against the stress factor in the slowpath). So the timing of stress GC depends on the thread local buffer size, the number of mutator threads, and the stress factor (compared to the thread local buffer size). In the comment, it states that non-precise stress test should be used with a large stress factor.
So it is expected that the number of mutator threads will change the number of stress GCs. However, if you notice a bug, please let me know.
Stress factors used were >> TLAB size. They were in the order of MBs, whereas the TLAB is around 32KB, from memory
If you are working on this, that's fine, just keep us posted. If you are not working on this, can you give more details on what you have observed for the issue and how to reproduce the issue?
Sorry -- I'm currently not working on it as I'm working on a paper. I also note a similar issue wherein ImmixAllocator on precise stress does not have the same number of GCs as a GC using BumpAllocator (SemiSpace, GenCopy, MarkCompact). I believe it is related to some implementation bug in ImmixAllocator. I might make a separate issue for this though.
To reproduce the non-precise stress GC issue, run:
MMTK_PLAN=SemiSpace MMTK_PRECISE_STRESS=false MMTK_STRESS_FACTOR=10485760 ./build/linux-x86_64-normal-server-release/jdk/bin/java -XX:MetaspaceSize=500M -XX:+DisableExplicitGC -server -XX:-TieredCompilation -Xcomp -XX:+UseThirdPartyHeap -Dprobes=RustMMTk -Djava.library.path=/home/kunals/git/evaluation/probes -Xms4192M -Xmx4192M -cp /usr/share/benchmarks/dacapo/dacapo-evaluation-git-29a657f.jar:/home/kunals/git/evaluation/probes:/home/kunals/git/evalutation/probes/probes.jar Harness -c probe.DacapoChopinCallback -n 2 lusearch
and
MMTK_PLAN=SemiSpace MMTK_PRECISE_STRESS=false MMTK_STRESS_FACTOR=10485760 taskset -c 0-7 ./build/linux-x86_64-normal-server-release/jdk/bin/java -XX:MetaspaceSize=500M -XX:+DisableExplicitGC -server -XX:-TieredCompilation -Xcomp -XX:+UseThirdPartyHeap -Dprobes=RustMMTk -Djava.library.path=/home/kunals/git/evaluation/probes -Xms4192M -Xmx4192M -cp /usr/share/benchmarks/dacapo/dacapo-evaluation-git-29a657f.jar:/home/kunals/git/evaluation/probes:/home/kunals/git/evalutation/probes/probes.jar Harness -c probe.DacapoChopinCallback -n 2 lusearch
You'll notice a different number of GCs for both the runs (with the trend being more threads => less GCs). Ideally, the number of stress GCs is independent of the number of mutator threads since it should just be based on the number of bytes allocated.
I can look into this in more detail after the paper deadline (Sunday 20th).