8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash)
Issue
The test compiler/startup/StartupOutput.java fails intermittently due to a crash after correctly printing the error Initial size of CodeCache is too small (the test limits the code cache using k-XX:InitialCodeCacheSize=1024K -XX:ReservedCodeCacheSize=1200k`).
The appearance of the issue is very dependent on thread scheduling. The original report happens during C1 initialization but C2 initialization is affected as well.
Causes
There is one occurrence during C1 initialization and one during C2 initialization where a call to RuntimeStub::new_runtime_stub can fail fatally if there is not enough space left.
For C1: Compiler::init_c1_runtime -> Runtime1::initialize -> Runtime1::generate_blob_for -> Runtime1::generate_blob -> RuntimeStub::new_runtime_stub.
For C2: C2Compiler::initialize -> OptoRuntime::generate -> OptoRuntime::generate_stub -> Compile::Compile -> Compile::Code_Gen -> PhaseOutput::install -> PhaseOutput::install_stub -> RuntimeStub::new_runtime_stub.
Solution
https://github.com/openjdk/jdk/pull/15970 introduced an optional argument to RuntimeStub::new_runtime_stub to determine if it fails fatally or not. We can take advantage of it to avoid crashing and instead pass the information about the success or failure of the allocation up the (C1 and C2 initialization) call stack up to where we can set the compilations as failed.
Progress
- [x] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [ ] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
Issue
- JDK-8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) (Bug - P4)(⚠️ The fixVersion in this issue is [24] but the fixVersion in .jcheck/conf is 23, a new backport will be created when this pr is integrated.)
Reviewers
- Tobias Hartmann (@TobiHartmann - Reviewer) ⚠️ Review applies to f16d9910
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19280/head:pull/19280
$ git checkout pull/19280
Update a local copy of the PR:
$ git checkout pull/19280
$ git pull https://git.openjdk.org/jdk.git pull/19280/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 19280
View PR using the GUI difftool:
$ git pr show -t 19280
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19280.diff
Webrev
:wave: Welcome back dfenacci! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@dafedafe This change now passes all automated pre-integration checks.
ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.
After integration, the commit message for the final commit will be:
8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash)
Reviewed-by: thartmann, kvn
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been 37 new commits pushed to the master branch:
- 6f3e3fd0d4f5e80e3fdbd26be6483c672479802a: 8339411: [PPC64] cmpxchgw/h/b doesn't handle external Label
- ed422ed1a3d6cdb733bc878c4173b43eb2dfb3da: 8338817: Wrong indent in API docs for java.lang.management.ManagementFactory
- 288fa60ebee445bb2835f096d144b9c6dea98df6: 8338891: HotSpotDiagnosticsMXBean missing @since tag
- dc4fd896289db1d2f6f7bbf5795fec533448a48c: 8339359: RISC-V: Use auipc explicitly in far_jump and far_call macro assembler routines
- 3a88fd437dfb218df5d3338c8ee7d70416839cf8: 8334724: C2: remove PhaseIdealLoop::cast_incr_before_loop()
- 62dad3a9ea222b0fbf15668d6e7b1c4ed61b2532: 8339351: Remove duplicate line in FileMapHeader::print
- 0e6bb514c8ec7c4a7100fe06eaa9e954a74fda30: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only
- b1163bcc88a5b88b9a56d5584310f1d679690ab2: 8256211: assert fired in java/net/httpclient/DependentPromiseActionsTest (infrequent)
- 03ba37e60ce08def6afd172efc1cdbbcc856c633: 8339169: Remove NaiveHuffman coder
- a136a85b6f5bbc92727883693c1ce31c37a82fd5: 8338768: Introduce runtime lookup to check for static builds
- ... and 27 more: https://git.openjdk.org/jdk/compare/ff59532ddd3002df61e46d58b3f29d26c78295da...master
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.
@dafedafe this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:
git checkout JDK-8326615
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push
@dafedafe The following label will be automatically applied to this pull request:
-
hotspot-compiler
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.
Webrevs
- 09: Full (e7d977e2)
- 08: Full - Incremental (2bf65dad)
- 07: Full - Incremental (0ea93c08)
- 06: Full (f79d0a31)
- 05: Full - Incremental (c7f484a4)
- 04: Full - Incremental (ea23c61e)
- 03: Full - Incremental (f16d9910)
- 02: Full - Incremental (bd2a7adf)
- 01: Full - Incremental (c505aac5)
- 00: Full (2fa14b54)
This is not a regression in JDK 23, right? Could you please adjust the affects versions in JIRA accordingly?
It is not. Fixing the version. Thanks a lot for reviewing @TobiHartmann.
Looks good to me otherwise.
Thanks for the review @TobiHartmann!
This looks OK, but isn't it a lot of changes just to get this test to pass? Aren't all of these allocation failures ultimately fatal? Is there a simpler way to handle this problem?
This looks OK, but isn't it a lot of changes just to get this test to pass? Aren't all of these allocation failures ultimately fatal? Is there a simpler way to handle this problem?
It seems a bit much indeed but I think there is potentially always the possibility of not failing but only disabling the compiler. @dean-long do you think the VM would anyway fail later on?
It may not fail, but if it can't create a C1 or C2 compiler, then that's bad, and we might argue that this kind of failure should be similar to a failure during JVM startup. In fact, I have been thinking that there are reasons why we might want these compiler stubs to be created earlier in startup, when we are still single-threaded. That would get rid of any races, and make a failure fatal. It would also allow us to allow us to put these stubs, if they are effectively execute-only because they don't need to be patched, into a special JIT region, avoiding the MAP_JIT overhead on macos-aarch64 (see JDK-8331978).
The error message is clear:
# Internal Error (codeBlob.cpp:429), pid=155190, tid=155378
# fatal error: Initial size of CodeCache is too small
I don't think we need to hide it.
I suggest instead adjust CodeCache minimal size check when JIT compilers are enabled in CompilerConfig::check_args_consistency(). We check CodeCacheMinimumUseSpace value their which is 400Kb only. It could be enough to run with -Xint (Interpreter only). But it is definitely not enough to run with JIT compilers.
I suggest instead adjust CodeCache minimal size check when JIT compilers are enabled in
CompilerConfig::check_args_consistency(). We checkCodeCacheMinimumUseSpacevalue their which is 400Kb only. It could be enough to run with-Xint(Interpreter only). But it is definitely not enough to run with JIT compilers.
@vnkozlov thanks for your insights!
I've tried to figure out an estimate of a bare minimum code cache size for interpreter/C1/C2/both for a couple of architectures (basically by running java -version):
| Int | C1 only | C2 only | C1/C2 tiered | |
|---|---|---|---|---|
| linux-x64 | 680k | 4m | 900k | 2.2m |
| macosx-aarch64 | 580k | 3m | 800k | 1.5m |
| linux-aarch64 | 520k | 3.8m | 800k | 1.5m |
(interesting that C2 needs less code cache than C1 but I guess it could also be due to the fact that its thread might actually exit before it properly starts)
800k for the interpreter alone and 5 times as much (4m) when C1 or C2 are running might be good minima. What do you think?
@dafedafe Did you run on avx512 x64 machine? We generate a lot of avx512 specific stubs for C2's intrinsics. We need to count them.
How you got 4Mb for C1 only? I got:
java -XX:+PrintCodeCache -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -version
...
CodeCache: size=49152Kb used=1084Kb max_used=1084Kb free=48067Kb
It was on old AVX2 machine.
@dafedafe Did you run on avx512 x64 machine? We generate a lot of avx512 specific stubs for C2's intrinsics. We need to count them.
Yes, I did run it on an avx512 machine. For C2 only I get:
CodeCache: size=49152Kb used=928Kb max_used=949Kb free=48223Kb
How you got 4Mb for C1 only? I got:
Strange. I get:
java -XX:+PrintCodeCache -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -version
...
CodeCache: size=49152Kb used=3922Kb max_used=3922Kb free=45229Kb
This is run on big (48 cores) avx512 box with C1 only:
$ ./jdk_git/build/product/images/jdk/bin/java -XX:+PrintCodeCache -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -XX:+PrintCompilation -version
32 1 1 java.lang.Object::<init> (1 bytes)
33 2 1 java.lang.String::hashCode (60 bytes)
33 3 n 0 jdk.internal.misc.Unsafe::getReferenceVolatile (native)
34 4 n 0 jdk.internal.vm.Continuation::enterSpecial (native) (static)
34 5 n 0 jdk.internal.vm.Continuation::doYield (native) (static)
java version "24-internal" 2025-03-18
Java(TM) SE Runtime Environment (build 24-internal-2024-07-03-1645504...)
35 6 1 java.lang.String::coder (15 bytes)
Java HotSpot(TM) 64-Bit Server VM (build 24-internal-2024-07-03-1645504..., mixed mode, emulated-client, sharing)
CodeCache: size=49152Kb used=1126Kb max_used=1126Kb free=48025Kb
bounds [0x00007f4fec83f000, 0x00007f4fecaaf000, 0x00007f4fef83f000]
total_blobs=279, nmethods=6, adapters=201, full_count=0
Compilation: enabled, stopped_count=0, restarted_count=0
C2 only (no C2 compilations):
$ ./jdk_git/build/product/images/jdk/bin/java -XX:+PrintCodeCache -XX:-TieredCompilation -XX:+PrintCompilation -version
32 1 n jdk.internal.misc.Unsafe::getReferenceVolatile (native)
33 2 n jdk.internal.vm.Continuation::enterSpecial (native) (static)
33 3 n jdk.internal.vm.Continuation::doYield (native) (static)
java version "24-internal" 2025-03-18
Java(TM) SE Runtime Environment (build 24-internal-2024-07-03-1645504.vkozlov...)
Java HotSpot(TM) 64-Bit Server VM (build 24-internal-2024-07-03-1645504.vkozlov..., mixed mode, sharing)
CodeCache: size=49152Kb used=612Kb max_used=626Kb free=48539Kb
bounds [0x00007fb817f6b000, 0x00007fb8181db000, 0x00007fb81af6b000]
total_blobs=250, nmethods=3, adapters=201, full_count=0
Compilation: enabled, stopped_count=0, restarted_count=0
Max used code cache size running a simple HelloWorld:
| Int | C1 only | C2 only | C1/C2 tiered | |
|---|---|---|---|---|
| linux-x64 (16 core, avx512) | 785Kb | 5400Kb | 1521Kb | 6219Kb |
| macosx-aarch64 (10 core) | 673Kb | 3210Kb | 1357Kb | 4701Kb |
| linux-aarch64 (16 core) | 671Kb | 6142Kb | 1462Kb | 6890Kb |
@dafedafe, can you show full output (including version lines) of java -XX:+PrintCodeCache -XX:TieredStopAtLevel=1 -version on linux-x64?
I want to see what consumes CodeCache in your experiments.
@vnkozlov here it is (sorry for the delay, I updated the repo as well but the results are similar):
build/linux-x64/jdk/bin/java -XX:+PrintCodeCache -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -version
java version "24-internal" 2025-03-18
Java(TM) SE Runtime Environment (build 24-internal-2024-07-23-1502438.dfenacci...)
Java HotSpot(TM) 64-Bit Server VM (build 24-internal-2024-07-23-1502438.dfenacci..., mixed mode, emulated-client)
CodeCache: size=49152Kb used=3486Kb max_used=3486Kb free=45665Kb
bounds [0x000070510d000000, 0x000070510d370000, 0x0000705110000000]
total_blobs=916, nmethods=556, adapters=284, full_count=0
Compilation: enabled, stopped_count=0, restarted_count=0
There seem to be many more methods that get compiled with C1 than with C2 when just running -version:
build/linux-x64/jdk/bin/java
You are running Exploded JDK which does not have CDS and most likely compiling modules loading code.
You should run build/linux-x64/images/jdk/bin/java instead which is used in production.
I assume you are building jdk with make jdk-image.
You should run build/linux-x64/images/jdk/bin/java instead which is used in production.
Oh, I see! Thanks @vnkozlov! Here are the updated values:
-version
| Int | C1 only | C2 only | C1/C2 tiered | |
|---|---|---|---|---|
| linux-x64 | 515Kb | 1125Kb | 604Kb | 1200Kb |
| macosx-aarch64 | 479Kb | 1080Kb | 542Kb | 1135Kb |
| linux-aarch64 | 478Kb | 1085Kb | 542Kb | 1166Kb |
HelloWorld.java
| Int | C1 only | C2 only | C1/C2 tiered | |
|---|---|---|---|---|
| linux-x64 | 720Kb | 6956Kb | 1289Kb | 5771Kb |
| macosx-aarch64 | 609Kb | 2760Kb | 1138Kb | 3768Kb |
| linux-aarch64 | 608Kb | 6833Kb | 1155Kb | 5827Kb |
I was still puzzled by C1 values being higher than C2 ones but noticed that most of the difference is made by the C1 temporary CodeBuffer(s) (563Kb each on linux-x64).
C1 has much lower compilation threshold so it is expected it has more nmethods. And yes, it has big static temporary buffer - more cores you have, more C1 compiler threads and more these temp buffers. C2 allocates temporary buffer only during compilation and freeing it after that.
Based on that we need to scale CodeCacheMinimumUseSpace based on number of C1 compiler threads to take into account buffer size: NMethodSizeLimit (or more precise Compiler::code_buffer_size()). And may be similar for C2 even so its buffer is not permanent. Size for C2 is difficult to determine because it is calculated based on compilation information. But may be we can use the same as for C1 as rough estimation.
@dafedafe This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
Thanks @vnkozlov @TobiHartmann for your reviews!
/integrate
Going to push as commit 633fad8e53109bef52190494a8b171035229d2ac.
Since your change was applied there have been 37 commits pushed to the master branch:
- 6f3e3fd0d4f5e80e3fdbd26be6483c672479802a: 8339411: [PPC64] cmpxchgw/h/b doesn't handle external Label
- ed422ed1a3d6cdb733bc878c4173b43eb2dfb3da: 8338817: Wrong indent in API docs for java.lang.management.ManagementFactory
- 288fa60ebee445bb2835f096d144b9c6dea98df6: 8338891: HotSpotDiagnosticsMXBean missing @since tag
- dc4fd896289db1d2f6f7bbf5795fec533448a48c: 8339359: RISC-V: Use auipc explicitly in far_jump and far_call macro assembler routines
- 3a88fd437dfb218df5d3338c8ee7d70416839cf8: 8334724: C2: remove PhaseIdealLoop::cast_incr_before_loop()
- 62dad3a9ea222b0fbf15668d6e7b1c4ed61b2532: 8339351: Remove duplicate line in FileMapHeader::print
- 0e6bb514c8ec7c4a7100fe06eaa9e954a74fda30: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only
- b1163bcc88a5b88b9a56d5584310f1d679690ab2: 8256211: assert fired in java/net/httpclient/DependentPromiseActionsTest (infrequent)
- 03ba37e60ce08def6afd172efc1cdbbcc856c633: 8339169: Remove NaiveHuffman coder
- a136a85b6f5bbc92727883693c1ce31c37a82fd5: 8338768: Introduce runtime lookup to check for static builds
- ... and 27 more: https://git.openjdk.org/jdk/compare/ff59532ddd3002df61e46d58b3f29d26c78295da...master
Your commit was automatically rebased without conflicts.
@dafedafe Pushed as commit 633fad8e53109bef52190494a8b171035229d2ac.
:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.