store LazyReferenceManager CPU Usage high

Environment Details

EclipseStore Version: 2.0.0
JDK version: 21
OS: WIndows 11

Describe the bug

When using the default settings for the LazyReferenceManager, it exhausts one CPU core completely.

According to the defaults in the code, the maximum activity (made sure by time budgeting), should be 0,1%.

Reason

It seems that the actual cycles aren't taking that long. Instead the issue seems to stem from calling updateStatistics(). This seems to totally blow the budget. I have confirmed this by copying the LazyReferenceManager implementation and only removing the updateStatistics() method call. CPU usage instantly went to what the docs for the default settings describe.

How to fix it

Given how simple the code looks (its just a loop and some simple ifs basically), I don't think there is much room for optimisation here. Might it be possible to have the cleanup routine track this progress live and make the monitor stateful? This way monitoring would run within the budgeting and won't have to iterate over everything.

Alternatively, I was thinking that it should be possible to disable the statistics altogether if they aren't being accessed anyway.

Nov 25 '24 10:11 Bios-Marcel

Could you provide more information? I have launched the big load test but did not observe this behavior.

Could you also share a simple, reproducible example? Additionally, could you include screenshots showing how you detect this behavior on Windows 11?

Here is a screenshot from my performance tests:

Nov 25 '24 13:11 zdenek-jonas

Since I can't share my code, I'd need some time to provide an example here.

The database is roughly 16MB in size and contains only a couple of lazy objects. There are in total 12 known lazy reference instances.

Either way, the effect is visible in both Task Manager and VisualVM.

Task Manager:

VisualVM:

CPU Time equals time, meaning the thread is constantly doing compute:

The percentage differences between task manager and VisualVM are probably due to how Performance and Efficiency cores are treated.

But yeah, it doesn't really make a lot of sense that it takes this long for mere 12 instances. Does the call to monitor.update trigger any sort of eventing or so? Maybe there's bytecode magic that I don't see by looking at the monitor code. (or I am blind :D)

Nov 25 '24 14:11 Bios-Marcel

could you the screenshot of this page from visualvm:

Nov 25 '24 14:11 zdenek-jonas

Would've sent the latter earlier, but VisualVM didn't pick up on the details :D

Nov 25 '24 14:11 Bios-Marcel

Normally looked like this:

Maybe the NPE stuff was due to using debug mode earlier.

Nov 25 '24 14:11 Bios-Marcel

Are you store the data in local disc or some other target?

Nov 25 '24 14:11 zdenek-jonas

Please, calll jstack on this process and send me the thread with LazyReferenceManager...

Nov 25 '24 14:11 zdenek-jonas

We store inside an H2 database (local disk, no network IO), but I've confirmed this happens when storing to disk directly (without an SQL provider inbetween).

Created with -e -l: jstack.txt

Nov 25 '24 14:11 Bios-Marcel

Do you have a change timing on LazyReferenceManager or is all values defaults? Something like this: https://docs.eclipsestore.io/manual/storage/loading-data/lazy-loading/clearing-lazy-references.html

Nov 25 '24 15:11 zdenek-jonas

It's all default, I haven't touched anything.

Nov 25 '24 15:11 Bios-Marcel

Could you please provide the exact JDK version using java -v?

I currently have no further ideas, but I will try testing on a Windows 11 PC.

Nov 25 '24 15:11 zdenek-jonas

PS C:\Users\Schramm> java --version
openjdk 21.0.2 2024-01-16
OpenJDK Runtime Environment (build 21.0.2+13-58)
OpenJDK 64-Bit Server VM (build 21.0.2+13-58, mixed mode, sharing)

Nov 25 '24 15:11 Bios-Marcel

Could you please try updating the JDK to version 21.0.5, if possible?

Nov 25 '24 16:11 zdenek-jonas

Makes no difference either.

Nov 26 '24 15:11 Bios-Marcel

Ok, I cannot simulate it on any of my projects. Please provide some code, and I will try to identify the problems.

Nov 26 '24 15:11 zdenek-jonas

I'll provide a comprehensive example once I find the time.

Nov 29 '24 11:11 Bios-Marcel

Is it possible that this is exactly the same as what I experienced on https://github.com/eclipse-serializer/serializer/issues/122? I've never been able to reproduce the scenario and haven't had time to test further in the last few months.

But the last thing I did was to find the cause, even if I didn't find how to get into this scenario. But maybe it will help. I guess no one noticed my comments because the issue was already closed.

Could be a simple fix and I would be willing to contribute that simple null check, but till now I am unsure if this really is the appropriate solution. Or if this shouldn't occure but it does because there is an error elsewhere...

Feb 04 '25 17:02 ZockiRR

I've already spent many hours testing in an attempt to simulate this behavior, but so far, without success.

What you can try when it happens:

Take multiple Thread Dumps (jstack) in succession and send them to us.
Use JFR to create a one-minute snapshot and send it to us, even privately if needed.

Feb 04 '25 18:02 zdenek-jonas

I took multiple thread dumps and attached them. I currently got my project running with the reported behaviour. As offered, I have opened a PR, which I currently think will solve the issue. The possibily that the ref is null is checked at other places too, so it doesn't seem to be entirely wrong.

threaddumps.zip

Also it stopped after some time. That is something I didn't observerd till now:

Feb 04 '25 19:02 ZockiRR

I guess this can be closed, given the PR was already merged :)

Sep 11 '25 09:09 Bios-Marcel