Upgrading Unleash causes Java Out of Memory
Describe the bug
When we upgraded Unleash java client from version 10.2.2 to 11.0.0, we started seeing errors in production
We reverted the upgrade and the errors went away.
Steps to reproduce the bug
No response
Expected behavior
No response
Logs, error output, etc.
An unhandled exception occurred for request:
com.dylibso.chicory.runtime.WasmRuntimeException: out of bounds memory access: attempted to access address: -759324 but limit is: 1192886272 and size: 4
at com.dylibso.chicory.runtime.ByteBufferMemory.outOfBoundsException(ByteBufferMemory.java:155)
at com.dylibso.chicory.runtime.ByteBufferMemory.writeI32(ByteBufferMemory.java:207)
at io.getunleash.wasm.YggdrasilMachine$AotMethods.memoryWriteInt(AotMethods.java:126)
at io.getunleash.wasm.YggdrasilMachine.func_1173(wasm)
at io.getunleash.wasm.YggdrasilMachine.func_1132(wasm)
at io.getunleash.wasm.YggdrasilMachine.func_1149(wasm)
at io.getunleash.wasm.YggdrasilMachine.func_438(wasm)
at io.getunleash.wasm.YggdrasilMachine.func_362(wasm)
at io.getunleash.wasm.YggdrasilMachine$MachineCall.call_362(Unknown Source)
at io.getunleash.wasm.YggdrasilMachine$MachineCall.call_dispatch_0(Unknown Source)
at io.getunleash.wasm.YggdrasilMachine$MachineCall.call(Unknown Source)
at io.getunleash.wasm.YggdrasilMachine.call(wasm)
at com.dylibso.chicory.runtime.Instance$Exports.lambda$function$0(Instance.java:214)
at io.getunleash.engine.WasmInterface.checkEnabled(WasmInterface.java:142)
at io.getunleash.engine.UnleashEngine.isEnabled(UnleashEngine.java:219)
at io.getunleash.repository.FeatureRepositoryImpl.isEnabled(FeatureRepositoryImpl.java:183)
at io.getunleash.EngineProxyImpl.isEnabled(EngineProxyImpl.java:48)
at io.getunleash.DefaultUnleash.isEnabled(DefaultUnleash.java:117)
at io.getunleash.Unleash.isEnabled(Unleash.java:20)
at io.getunleash.DefaultUnleash.isEnabled(DefaultUnleash.java:99)
....
Screenshots
No response
Additional context
No response
Unleash version
11.0.0
Subscription type
None
Hosting type
None
SDK information (language and version)
Java 21
I can confirm we also see java.lang.OutOfMemoryError after upgrading to version 11.0.0. Downgrading fixes the problem.
Hey @Syuziko, we're looking into this one but finding it really hard to reproduce. I have some patches on the way out that I think may help but without a validation case, I can't promise anything.
By any chance are you using multiple instances of the Unleash SDK in the same process?
Hey @Karamell, just to confirm, are you seeing the same error as Syuziko or an OutOfMemoryError? The original stack trace looks to be an out of bounds write to my eyes
@sighphyre: This is how it looks:
stackTrace: com.dylibso.chicory.runtime.WasmRuntimeException: out of bounds memory access: attempted to access address: -326316 but limit is: 301989888 and size: 4
at com.dylibso.chicory.runtime.ByteBufferMemory.outOfBoundsException(ByteBufferMemory.java:155)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Assembly trace from producer [reactor.core.publisher.MonoDefer] :
also a stracktrace with only
java.lang.OutOfMemoryError: Java heap space
that happened at the same time
Ah thanks for the info here. Okay I might actually know what's causing this and have a fix for it. Are you using multiple instances of the SDK in the same process by any chance? Or tearing recreating the SDK at some point?
@sighphyre AFAIK it's not multiple instances. We use spring and it's instantiated in a @Bean. Nothing fancy.
Hey @Karamell @Syuziko,
So we can't reproduce this. At all. We've pressure tested the SDK in as many ways as we can think of. We also see that the adoption of this version is already fairly wide spread and no else is reporting this so I think it's something very specific to your setups.
We've made some changes to the way we handle memory and locking here but without a reproduction case we can't promise anything. If you feel like helping us out by checking in a staging/testing environment that would be amazing but no pressure.
Rolling back to a previous version is absolutely fine, we'll continue to support 9.x and 10.x for at least a few years but we also really, really want to get to the bottom of this bug. If there's any insight either of you could shed on the lead up to the exception being raised that would be amazing and if either of you are willing to have a conversation with us on our Slack channel to see if we can't work it out over a chat that would also be amazing.
Also a reproducible example would be absolutely golden, if possible
Hi @sighphyre ,
I'll try to reproduce and provide more details.
You mentioned that you've made some changes with handling memory and locking - I can try run it on our side and see whether it resolves the issue or not.
@Syuziko Thank you so much!
Our service went down because of this on version 11.1.0 too.
We're seeing similar things on 11.x too.
I've also seen a number of exceptions thrown from com.dylibso.chicory.runtime. Mostly com.dylibso.chicory.runtime.TrapException: Trapped on unreachable instruction
Hi team,
We’re seeing similar issues on version 11.x as well. We’ve also observed a number of exceptions thrown from com.dylibso.chicory.runtime, mostly:
Trapped on unreachable instruction
We have not been able to reliably reproduce the problem yet, but it is causing our service to crash in production.
Any guidance or insights would be greatly appreciated.
Thanks in advance!
While reviewing the code, I noticed that the DefaultUnleash class defines a shutdown() method:
@Override
public void shutdown() {
featureRepository.shutdown();
config.getScheduledExecutor().shutdown();
}
In the example examples/spring-boot-example/src/main/java/io/getunleash/unleash/example/UnleashSpringConfig.java , the bean is declared as follows:
@Bean
public Unleash unleash(UnleashConfig unleashConfig) {
return new DefaultUnleash(unleashConfig);
}
Shouldn’t it be declared like this instead, so that the shutdown() method is automatically invoked when the Spring context is closed?
@Bean(destroyMethod = "shutdown")
public Unleash unleash(UnleashConfig unleashConfig) {
return new DefaultUnleash(unleashConfig);
}
Or is the shutdown() method triggered in another way that I might have missed?
Thanks to @frafaelcb, we've managed to narrow down the issue to read access to the INSTANCE property. We expect this issue to be fixed with https://github.com/Unleash/unleash-java-sdk/pull/324, which was released in version 11.1.1: https://github.com/Unleash/unleash-java-sdk/releases/tag/v11.1.1
I'm considering closing the issue after waiting a bit longer for confirmation. If you do have successful examples running this latest version, please let us know in the comments. Thanks!
Today we updated the new version v11.1.1 Latest. We included it in the service where the problem occurred. Two days ago, we upgraded it to a less critical service, and so far, everything is fine.
We'll monitor the situation this weekend, and I'll update you with the results next week.
Good evening!
Unfortunately, we still had some problems with the new version, although they are occurring less frequently than before. I'll leave the call stack here.
com.dylibso.chicory.runtime.WasmRuntimeException: out of bounds memory access: attempted to access address: -629548 but limit is: 1117716480 and size: 4 at com.dylibso.chicory.runtime.ByteBufferMemory.outOfBoundsException(ByteBufferMemory.java:303) ~[runtime-1.5.1.jar!/:?] at com.dylibso.chicory.runtime.ByteBufferMemory.writeI32(ByteBufferMemory.java:355) ~[runtime-1.5.1.jar!/:?] at io.getunleash.wasm.YggdrasilMachineShaded.memoryWriteInt(Shaded.java:155) ~[yggdrasil-engine-0.4.2.jar!/:0.4.2] at io.getunleash.wasm.YggdrasilMachineFuncGroup_0.func_1237(Unknown Source) ~[yggdrasil-engine-0.4.2.jar!/:0.4.2] at io.getunleash.wasm.YggdrasilMachineFuncGroup_0.func_1195(Unknown Source) ~[yggdrasil-engine-0.4.2.jar!/:0.4.2] at io.getunleash.wasm.YggdrasilMachineFuncGroup_0.func_1213(Unknown Source) ~[yggdrasil-engine-0.4.2.jar!/:0.4.2] at io.getunleash.wasm.YggdrasilMachineFuncGroup_0.func_509(Unknown Source) ~[yggdrasil-engine-0.4.2.jar!/:0.4.2] at io.getunleash.wasm.YggdrasilMachineFuncGroup_0.func_353(Unknown Source) ~[yggdrasil-engine-0.4.2.jar!/:0.4.2] at io.getunleash.wasm.YggdrasilMachineFuncGroup_0.call_353(Unknown Source) ~[yggdrasil-engine-0.4.2.jar!/:0.4.2] at io.getunleash.wasm.YggdrasilMachineDispatch_0.call_dispatch_0(Unknown Source) ~[yggdrasil-engine-0.4.2.jar!/:0.4.2] at io.getunleash.wasm.YggdrasilMachineMachineCall.call(Unknown Source) ~[yggdrasil-engine-0.4.2.jar!/:0.4.2] at io.getunleash.wasm.YggdrasilMachine.call(wasm) ~[yggdrasil-engine-0.4.2.jar!/:0.4.2] at com.dylibso.chicory.runtime.Instance$Exports.lambda$function$0(Instance.java:219) ~[runtime-1.5.1.jar!/:?] at io.getunleash.engine.WasmInterface.checkVariant(WasmInterface.java:173) ~[yggdrasil-engine-0.4.2.jar!/:0.4.2] at io.getunleash.engine.UnleashEngine.getVariant(UnleashEngine.java:257) ~[yggdrasil-engine-0.4.2.jar!/:0.4.2] at io.getunleash.repository.FeatureRepositoryImpl.getVariant(FeatureRepositoryImpl.java:228) ~[unleash-client-java-11.1.1.jar!/:11.1.1] at io.getunleash.EngineProxyImpl.getVariant(EngineProxyImpl.java:53) ~[unleash-client-java-11.1.1.jar!/:11.1.1] at io.getunleash.DefaultUnleash.getVariant(DefaultUnleash.java:149) ~[unleash-client-java-11.1.1.jar!/:11.1.1] at io.getunleash.DefaultUnleash.getVariant(DefaultUnleash.java:132) ~[unleash-client-java-11.1.1.jar!/:11.1.1] at
Thanks @frafaelcb for the details! We've finally made the decision to drop WASM with Chicory as the engine due to all these errors. Instead, we're swapping to plain FFI (same implementation as in the v10 series, which is stable) but with some important improvements:
- By using FlatBuffers, we managed to bring its performance on par with the WASM implementation.
- We’re no longer bound to libc, which means the SDK should run without any external native libraries (this was a problem in v10: https://github.com/Unleash/unleash-java-sdk/issues/275).
We're releasing this as a beta: 11.2.0.beta.1
Our ask is: if you or someone in this thread is willing to give it a try, please let us know how it behaves in your environment. We’ve been running this version ourselves for 4 days already in a production-like setup without crashes or memory leaks.
Benchmarks
We ran JMH throughput benchmarks comparing:
- 10.2.2 (v10 engine)
- 11.x (WASM + Chicory)
- 11.2.0.beta.1 (new FFI + FlatBuffers engine)
Tests were performed on both an Apple M2 and a Ryzen 5800H, and the results were very consistent.
Summary:
- Coming from 10.2.x → expect a 3–6× performance improvement on isEnabled / getVariant.
- Coming from 11.x (WASM) → performance should be on par or slightly better; no regressions expected.
- Results also confirm that 10.2.2 was not purely CPU-bound, as both CPUs scale significantly with the new engine.
Numbers at a glance (ops/s, JMH thrpt):
10.2.2
- M2: ~41k–59k
- 5800H: ~48k–62k
11.2.0-SNAPSHOT (FFI + FlatBuffers)
- M2: ~256k–271k
- 5800H: ~163k–166k
This places the new FFI engine on the same level as the previous WASM engine, while avoiding the runtime and libc issues we encountered with Chicory.
If anyone tries out 11.2.0.beta.1, we’d really appreciate any feedback — especially around stability and performance.
Great, I'll talk to the team here and we'll use the same strategy. We'll include it in a less critical system for a week and then move to the most critical one here. I'll let you know as soon as I update.