mmtk-core icon indicating copy to clipboard operation
mmtk-core copied to clipboard

Number of (non-precise) stress GCs is sensitive to number of mutator threads

Open k-sareen opened this issue 3 years ago • 4 comments

It seems like for the same stress factor, if we change the number of mutator threads, the number of GCs occurring due to non-precise stress (i.e. MMTK_PRECISE_STRESS=false) is wildly different. It may either be an issue with how non-precise stress works or is a bug in stress factor handling that only manifests with non-precise stress.

k-sareen avatar Mar 14 '22 06:03 k-sareen

Non precise stress test counts the thread local buffer as allocated (as it only increments the allocation bytes counter and checks against the stress factor in the slowpath). So the timing of stress GC depends on the thread local buffer size, the number of mutator threads, and the stress factor (compared to the thread local buffer size). In the comment, it states that non-precise stress test should be used with a large stress factor.

So it is expected that the number of mutator threads will change the number of stress GCs. However, if you notice a bug, please let me know.

qinsoon avatar Mar 14 '22 07:03 qinsoon

Stress factors used were >> TLAB size. They were in the order of MBs, whereas the TLAB is around 32KB, from memory

k-sareen avatar Mar 14 '22 07:03 k-sareen

If you are working on this, that's fine, just keep us posted. If you are not working on this, can you give more details on what you have observed for the issue and how to reproduce the issue?

qinsoon avatar Mar 14 '22 23:03 qinsoon

Sorry -- I'm currently not working on it as I'm working on a paper. I also note a similar issue wherein ImmixAllocator on precise stress does not have the same number of GCs as a GC using BumpAllocator (SemiSpace, GenCopy, MarkCompact). I believe it is related to some implementation bug in ImmixAllocator. I might make a separate issue for this though.

To reproduce the non-precise stress GC issue, run:

MMTK_PLAN=SemiSpace MMTK_PRECISE_STRESS=false MMTK_STRESS_FACTOR=10485760 ./build/linux-x86_64-normal-server-release/jdk/bin/java -XX:MetaspaceSize=500M -XX:+DisableExplicitGC -server -XX:-TieredCompilation -Xcomp -XX:+UseThirdPartyHeap -Dprobes=RustMMTk -Djava.library.path=/home/kunals/git/evaluation/probes -Xms4192M -Xmx4192M -cp /usr/share/benchmarks/dacapo/dacapo-evaluation-git-29a657f.jar:/home/kunals/git/evaluation/probes:/home/kunals/git/evalutation/probes/probes.jar Harness -c probe.DacapoChopinCallback -n 2 lusearch 

and

MMTK_PLAN=SemiSpace MMTK_PRECISE_STRESS=false MMTK_STRESS_FACTOR=10485760 taskset -c 0-7 ./build/linux-x86_64-normal-server-release/jdk/bin/java -XX:MetaspaceSize=500M -XX:+DisableExplicitGC -server -XX:-TieredCompilation -Xcomp -XX:+UseThirdPartyHeap -Dprobes=RustMMTk -Djava.library.path=/home/kunals/git/evaluation/probes -Xms4192M -Xmx4192M -cp /usr/share/benchmarks/dacapo/dacapo-evaluation-git-29a657f.jar:/home/kunals/git/evaluation/probes:/home/kunals/git/evalutation/probes/probes.jar Harness -c probe.DacapoChopinCallback -n 2 lusearch 

You'll notice a different number of GCs for both the runs (with the trend being more threads => less GCs). Ideally, the number of stress GCs is independent of the number of mutator threads since it should just be based on the number of bytes allocated.

I can look into this in more detail after the paper deadline (Sunday 20th).

k-sareen avatar Mar 15 '22 02:03 k-sareen