LazyReferenceManager CPU Usage high
Environment Details
- EclipseStore Version: 2.0.0
- JDK version: 21
- OS: WIndows 11
Describe the bug
When using the default settings for the LazyReferenceManager, it exhausts one CPU core completely.
According to the defaults in the code, the maximum activity (made sure by time budgeting), should be 0,1%.
Reason
It seems that the actual cycles aren't taking that long. Instead the issue seems to stem from calling updateStatistics(). This seems to totally blow the budget. I have confirmed this by copying the LazyReferenceManager implementation and only removing the updateStatistics() method call. CPU usage instantly went to what the docs for the default settings describe.
How to fix it
Given how simple the code looks (its just a loop and some simple ifs basically), I don't think there is much room for optimisation here. Might it be possible to have the cleanup routine track this progress live and make the monitor stateful? This way monitoring would run within the budgeting and won't have to iterate over everything.
Alternatively, I was thinking that it should be possible to disable the statistics altogether if they aren't being accessed anyway.
Could you provide more information? I have launched the big load test but did not observe this behavior.
Could you also share a simple, reproducible example? Additionally, could you include screenshots showing how you detect this behavior on Windows 11?
Here is a screenshot from my performance tests:
Since I can't share my code, I'd need some time to provide an example here.
The database is roughly 16MB in size and contains only a couple of lazy objects. There are in total 12 known lazy reference instances.
Either way, the effect is visible in both Task Manager and VisualVM.
Task Manager:
VisualVM:
CPU Time equals time, meaning the thread is constantly doing compute:
The percentage differences between task manager and VisualVM are probably due to how Performance and Efficiency cores are treated.
But yeah, it doesn't really make a lot of sense that it takes this long for mere 12 instances. Does the call to monitor.update trigger any sort of eventing or so? Maybe there's bytecode magic that I don't see by looking at the monitor code. (or I am blind :D)
could you the screenshot of this page from visualvm:
Would've sent the latter earlier, but VisualVM didn't pick up on the details :D
Normally looked like this:
Maybe the NPE stuff was due to using debug mode earlier.
Are you store the data in local disc or some other target?
Please, calll jstack on this process and send me the thread with LazyReferenceManager...
We store inside an H2 database (local disk, no network IO), but I've confirmed this happens when storing to disk directly (without an SQL provider inbetween).
Created with -e -l:
jstack.txt
Do you have a change timing on LazyReferenceManager or is all values defaults? Something like this: https://docs.eclipsestore.io/manual/storage/loading-data/lazy-loading/clearing-lazy-references.html
It's all default, I haven't touched anything.
Could you please provide the exact JDK version using java -v?
I currently have no further ideas, but I will try testing on a Windows 11 PC.
PS C:\Users\Schramm> java --version
openjdk 21.0.2 2024-01-16
OpenJDK Runtime Environment (build 21.0.2+13-58)
OpenJDK 64-Bit Server VM (build 21.0.2+13-58, mixed mode, sharing)
Could you please try updating the JDK to version 21.0.5, if possible?
Makes no difference either.
Ok, I cannot simulate it on any of my projects. Please provide some code, and I will try to identify the problems.
I'll provide a comprehensive example once I find the time.
Is it possible that this is exactly the same as what I experienced on https://github.com/eclipse-serializer/serializer/issues/122? I've never been able to reproduce the scenario and haven't had time to test further in the last few months.
But the last thing I did was to find the cause, even if I didn't find how to get into this scenario. But maybe it will help. I guess no one noticed my comments because the issue was already closed.
Could be a simple fix and I would be willing to contribute that simple null check, but till now I am unsure if this really is the appropriate solution. Or if this shouldn't occure but it does because there is an error elsewhere...
I've already spent many hours testing in an attempt to simulate this behavior, but so far, without success.
What you can try when it happens:
- Take multiple Thread Dumps (jstack) in succession and send them to us.
- Use JFR to create a one-minute snapshot and send it to us, even privately if needed.
I took multiple thread dumps and attached them. I currently got my project running with the reported behaviour. As offered, I have opened a PR, which I currently think will solve the issue. The possibily that the ref is null is checked at other places too, so it doesn't seem to be entirely wrong.
Also it stopped after some time. That is something I didn't observerd till now:
I guess this can be closed, given the PR was already merged :)