jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash)

Open dafedafe opened this issue 1 year ago • 18 comments

Issue

The test compiler/startup/StartupOutput.java fails intermittently due to a crash after correctly printing the error Initial size of CodeCache is too small (the test limits the code cache using k-XX:InitialCodeCacheSize=1024K -XX:ReservedCodeCacheSize=1200k`). The appearance of the issue is very dependent on thread scheduling. The original report happens during C1 initialization but C2 initialization is affected as well.

Causes

There is one occurrence during C1 initialization and one during C2 initialization where a call to RuntimeStub::new_runtime_stub can fail fatally if there is not enough space left. For C1: Compiler::init_c1_runtime -> Runtime1::initialize -> Runtime1::generate_blob_for -> Runtime1::generate_blob -> RuntimeStub::new_runtime_stub. For C2: C2Compiler::initialize -> OptoRuntime::generate -> OptoRuntime::generate_stub -> Compile::Compile -> Compile::Code_Gen -> PhaseOutput::install -> PhaseOutput::install_stub -> RuntimeStub::new_runtime_stub.

Solution

https://github.com/openjdk/jdk/pull/15970 introduced an optional argument to RuntimeStub::new_runtime_stub to determine if it fails fatally or not. We can take advantage of it to avoid crashing and instead pass the information about the success or failure of the allocation up the (C1 and C2 initialization) call stack up to where we can set the compilations as failed.


Progress

  • [x] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [ ] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Issue

  • JDK-8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) (Bug - P4)(⚠️ The fixVersion in this issue is [24] but the fixVersion in .jcheck/conf is 23, a new backport will be created when this pr is integrated.)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19280/head:pull/19280
$ git checkout pull/19280

Update a local copy of the PR:
$ git checkout pull/19280
$ git pull https://git.openjdk.org/jdk.git pull/19280/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19280

View PR using the GUI difftool:
$ git pr show -t 19280

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19280.diff

Webrev

Link to Webrev Comment

dafedafe avatar May 17 '24 09:05 dafedafe

:wave: Welcome back dfenacci! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar May 17 '24 09:05 bridgekeeper[bot]

@dafedafe This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash)

Reviewed-by: thartmann, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 37 new commits pushed to the master branch:

  • 6f3e3fd0d4f5e80e3fdbd26be6483c672479802a: 8339411: [PPC64] cmpxchgw/h/b doesn't handle external Label
  • ed422ed1a3d6cdb733bc878c4173b43eb2dfb3da: 8338817: Wrong indent in API docs for java.lang.management.ManagementFactory
  • 288fa60ebee445bb2835f096d144b9c6dea98df6: 8338891: HotSpotDiagnosticsMXBean missing @since tag
  • dc4fd896289db1d2f6f7bbf5795fec533448a48c: 8339359: RISC-V: Use auipc explicitly in far_jump and far_call macro assembler routines
  • 3a88fd437dfb218df5d3338c8ee7d70416839cf8: 8334724: C2: remove PhaseIdealLoop::cast_incr_before_loop()
  • 62dad3a9ea222b0fbf15668d6e7b1c4ed61b2532: 8339351: Remove duplicate line in FileMapHeader::print
  • 0e6bb514c8ec7c4a7100fe06eaa9e954a74fda30: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only
  • b1163bcc88a5b88b9a56d5584310f1d679690ab2: 8256211: assert fired in java/net/httpclient/DependentPromiseActionsTest (infrequent)
  • 03ba37e60ce08def6afd172efc1cdbbcc856c633: 8339169: Remove NaiveHuffman coder
  • a136a85b6f5bbc92727883693c1ce31c37a82fd5: 8338768: Introduce runtime lookup to check for static builds
  • ... and 27 more: https://git.openjdk.org/jdk/compare/ff59532ddd3002df61e46d58b3f29d26c78295da...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk[bot] avatar May 17 '24 09:05 openjdk[bot]

@dafedafe this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8326615
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

openjdk[bot] avatar May 17 '24 09:05 openjdk[bot]

@dafedafe The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar May 17 '24 09:05 openjdk[bot]

This is not a regression in JDK 23, right? Could you please adjust the affects versions in JIRA accordingly?

It is not. Fixing the version. Thanks a lot for reviewing @TobiHartmann.

dafedafe avatar May 21 '24 09:05 dafedafe

Looks good to me otherwise.

Thanks for the review @TobiHartmann!

dafedafe avatar May 28 '24 07:05 dafedafe

This looks OK, but isn't it a lot of changes just to get this test to pass? Aren't all of these allocation failures ultimately fatal? Is there a simpler way to handle this problem?

dean-long avatar May 30 '24 22:05 dean-long

This looks OK, but isn't it a lot of changes just to get this test to pass? Aren't all of these allocation failures ultimately fatal? Is there a simpler way to handle this problem?

It seems a bit much indeed but I think there is potentially always the possibility of not failing but only disabling the compiler. @dean-long do you think the VM would anyway fail later on?

dafedafe avatar Jun 04 '24 07:06 dafedafe

It may not fail, but if it can't create a C1 or C2 compiler, then that's bad, and we might argue that this kind of failure should be similar to a failure during JVM startup. In fact, I have been thinking that there are reasons why we might want these compiler stubs to be created earlier in startup, when we are still single-threaded. That would get rid of any races, and make a failure fatal. It would also allow us to allow us to put these stubs, if they are effectively execute-only because they don't need to be patched, into a special JIT region, avoiding the MAP_JIT overhead on macos-aarch64 (see JDK-8331978).

dean-long avatar Jun 04 '24 19:06 dean-long

The error message is clear:

#  Internal Error (codeBlob.cpp:429), pid=155190, tid=155378
#  fatal error: Initial size of CodeCache is too small

I don't think we need to hide it.

vnkozlov avatar Jun 11 '24 17:06 vnkozlov

I suggest instead adjust CodeCache minimal size check when JIT compilers are enabled in CompilerConfig::check_args_consistency(). We check CodeCacheMinimumUseSpace value their which is 400Kb only. It could be enough to run with -Xint (Interpreter only). But it is definitely not enough to run with JIT compilers.

vnkozlov avatar Jun 11 '24 17:06 vnkozlov

I suggest instead adjust CodeCache minimal size check when JIT compilers are enabled in CompilerConfig::check_args_consistency(). We check CodeCacheMinimumUseSpace value their which is 400Kb only. It could be enough to run with -Xint (Interpreter only). But it is definitely not enough to run with JIT compilers.

@vnkozlov thanks for your insights! I've tried to figure out an estimate of a bare minimum code cache size for interpreter/C1/C2/both for a couple of architectures (basically by running java -version):

Int C1 only C2 only C1/C2 tiered
linux-x64 680k 4m 900k 2.2m
macosx-aarch64 580k 3m 800k 1.5m
linux-aarch64 520k 3.8m 800k 1.5m

(interesting that C2 needs less code cache than C1 but I guess it could also be due to the fact that its thread might actually exit before it properly starts)

800k for the interpreter alone and 5 times as much (4m) when C1 or C2 are running might be good minima. What do you think?

dafedafe avatar Jul 02 '24 06:07 dafedafe

@dafedafe Did you run on avx512 x64 machine? We generate a lot of avx512 specific stubs for C2's intrinsics. We need to count them.

How you got 4Mb for C1 only? I got:

java -XX:+PrintCodeCache -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -version
...
CodeCache: size=49152Kb used=1084Kb max_used=1084Kb free=48067Kb

vnkozlov avatar Jul 02 '24 18:07 vnkozlov

It was on old AVX2 machine.

vnkozlov avatar Jul 02 '24 18:07 vnkozlov

@dafedafe Did you run on avx512 x64 machine? We generate a lot of avx512 specific stubs for C2's intrinsics. We need to count them.

Yes, I did run it on an avx512 machine. For C2 only I get:

CodeCache: size=49152Kb used=928Kb max_used=949Kb free=48223Kb

How you got 4Mb for C1 only? I got:

Strange. I get:

java -XX:+PrintCodeCache -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -version
...
CodeCache: size=49152Kb used=3922Kb max_used=3922Kb free=45229Kb

dafedafe avatar Jul 03 '24 14:07 dafedafe

This is run on big (48 cores) avx512 box with C1 only:

$ ./jdk_git/build/product/images/jdk/bin/java -XX:+PrintCodeCache -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -XX:+PrintCompilation -version
32    1       1       java.lang.Object::<init> (1 bytes)
33    2       1       java.lang.String::hashCode (60 bytes)
33    3     n 0       jdk.internal.misc.Unsafe::getReferenceVolatile (native)   
34    4     n 0       jdk.internal.vm.Continuation::enterSpecial (native)   (static)
34    5     n 0       jdk.internal.vm.Continuation::doYield (native)   (static)
java version "24-internal" 2025-03-18
Java(TM) SE Runtime Environment (build 24-internal-2024-07-03-1645504...)
35    6       1       java.lang.String::coder (15 bytes)
Java HotSpot(TM) 64-Bit Server VM (build 24-internal-2024-07-03-1645504..., mixed mode, emulated-client, sharing)
CodeCache: size=49152Kb used=1126Kb max_used=1126Kb free=48025Kb
 bounds [0x00007f4fec83f000, 0x00007f4fecaaf000, 0x00007f4fef83f000]
 total_blobs=279, nmethods=6, adapters=201, full_count=0
Compilation: enabled, stopped_count=0, restarted_count=0

vnkozlov avatar Jul 03 '24 16:07 vnkozlov

C2 only (no C2 compilations):

$ ./jdk_git/build/product/images/jdk/bin/java -XX:+PrintCodeCache -XX:-TieredCompilation -XX:+PrintCompilation -version
32    1     n       jdk.internal.misc.Unsafe::getReferenceVolatile (native)   
33    2     n       jdk.internal.vm.Continuation::enterSpecial (native)   (static)
33    3     n       jdk.internal.vm.Continuation::doYield (native)   (static)
java version "24-internal" 2025-03-18
Java(TM) SE Runtime Environment (build 24-internal-2024-07-03-1645504.vkozlov...)
Java HotSpot(TM) 64-Bit Server VM (build 24-internal-2024-07-03-1645504.vkozlov..., mixed mode, sharing)
CodeCache: size=49152Kb used=612Kb max_used=626Kb free=48539Kb
 bounds [0x00007fb817f6b000, 0x00007fb8181db000, 0x00007fb81af6b000]
 total_blobs=250, nmethods=3, adapters=201, full_count=0
Compilation: enabled, stopped_count=0, restarted_count=0

vnkozlov avatar Jul 03 '24 17:07 vnkozlov

Max used code cache size running a simple HelloWorld:

Int C1 only C2 only C1/C2 tiered
linux-x64 (16 core, avx512) 785Kb 5400Kb 1521Kb 6219Kb
macosx-aarch64 (10 core) 673Kb 3210Kb 1357Kb 4701Kb
linux-aarch64 (16 core) 671Kb 6142Kb 1462Kb 6890Kb

dafedafe avatar Jul 04 '24 12:07 dafedafe

@dafedafe, can you show full output (including version lines) of java -XX:+PrintCodeCache -XX:TieredStopAtLevel=1 -version on linux-x64? I want to see what consumes CodeCache in your experiments.

vnkozlov avatar Jul 09 '24 15:07 vnkozlov

@vnkozlov here it is (sorry for the delay, I updated the repo as well but the results are similar):

build/linux-x64/jdk/bin/java -XX:+PrintCodeCache -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -version
java version "24-internal" 2025-03-18
Java(TM) SE Runtime Environment (build 24-internal-2024-07-23-1502438.dfenacci...)
Java HotSpot(TM) 64-Bit Server VM (build 24-internal-2024-07-23-1502438.dfenacci..., mixed mode, emulated-client)
CodeCache: size=49152Kb used=3486Kb max_used=3486Kb free=45665Kb
 bounds [0x000070510d000000, 0x000070510d370000, 0x0000705110000000]
 total_blobs=916, nmethods=556, adapters=284, full_count=0
Compilation: enabled, stopped_count=0, restarted_count=0

There seem to be many more methods that get compiled with C1 than with C2 when just running -version:

  • java -XX:+PrintCodeCache -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -XX:+PrintCompilation -version:C1.log

  • java -XX:+PrintCodeCache -XX:-TieredCompilation -XX:+PrintCompilation -version: C2.log

dafedafe avatar Jul 23 '24 15:07 dafedafe

build/linux-x64/jdk/bin/java

You are running Exploded JDK which does not have CDS and most likely compiling modules loading code. You should run build/linux-x64/images/jdk/bin/java instead which is used in production. I assume you are building jdk with make jdk-image.

vnkozlov avatar Jul 23 '24 16:07 vnkozlov

You should run build/linux-x64/images/jdk/bin/java instead which is used in production.

Oh, I see! Thanks @vnkozlov! Here are the updated values:

-version

Int C1 only C2 only C1/C2 tiered
linux-x64 515Kb 1125Kb 604Kb 1200Kb
macosx-aarch64 479Kb 1080Kb 542Kb 1135Kb
linux-aarch64 478Kb 1085Kb 542Kb 1166Kb

HelloWorld.java

Int C1 only C2 only C1/C2 tiered
linux-x64 720Kb 6956Kb 1289Kb 5771Kb
macosx-aarch64 609Kb 2760Kb 1138Kb 3768Kb
linux-aarch64 608Kb 6833Kb 1155Kb 5827Kb

I was still puzzled by C1 values being higher than C2 ones but noticed that most of the difference is made by the C1 temporary CodeBuffer(s) (563Kb each on linux-x64).

dafedafe avatar Jul 24 '24 13:07 dafedafe

C1 has much lower compilation threshold so it is expected it has more nmethods. And yes, it has big static temporary buffer - more cores you have, more C1 compiler threads and more these temp buffers. C2 allocates temporary buffer only during compilation and freeing it after that.

vnkozlov avatar Jul 24 '24 19:07 vnkozlov

Based on that we need to scale CodeCacheMinimumUseSpace based on number of C1 compiler threads to take into account buffer size: NMethodSizeLimit (or more precise Compiler::code_buffer_size()). And may be similar for C2 even so its buffer is not permanent. Size for C2 is difficult to determine because it is calculated based on compilation information. But may be we can use the same as for C1 as rough estimation.

vnkozlov avatar Jul 24 '24 19:07 vnkozlov

@dafedafe This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Aug 22 '24 00:08 bridgekeeper[bot]

Thanks @vnkozlov @TobiHartmann for your reviews!

dafedafe avatar Sep 03 '24 09:09 dafedafe

/integrate

dafedafe avatar Sep 03 '24 09:09 dafedafe

Going to push as commit 633fad8e53109bef52190494a8b171035229d2ac. Since your change was applied there have been 37 commits pushed to the master branch:

  • 6f3e3fd0d4f5e80e3fdbd26be6483c672479802a: 8339411: [PPC64] cmpxchgw/h/b doesn't handle external Label
  • ed422ed1a3d6cdb733bc878c4173b43eb2dfb3da: 8338817: Wrong indent in API docs for java.lang.management.ManagementFactory
  • 288fa60ebee445bb2835f096d144b9c6dea98df6: 8338891: HotSpotDiagnosticsMXBean missing @since tag
  • dc4fd896289db1d2f6f7bbf5795fec533448a48c: 8339359: RISC-V: Use auipc explicitly in far_jump and far_call macro assembler routines
  • 3a88fd437dfb218df5d3338c8ee7d70416839cf8: 8334724: C2: remove PhaseIdealLoop::cast_incr_before_loop()
  • 62dad3a9ea222b0fbf15668d6e7b1c4ed61b2532: 8339351: Remove duplicate line in FileMapHeader::print
  • 0e6bb514c8ec7c4a7100fe06eaa9e954a74fda30: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only
  • b1163bcc88a5b88b9a56d5584310f1d679690ab2: 8256211: assert fired in java/net/httpclient/DependentPromiseActionsTest (infrequent)
  • 03ba37e60ce08def6afd172efc1cdbbcc856c633: 8339169: Remove NaiveHuffman coder
  • a136a85b6f5bbc92727883693c1ce31c37a82fd5: 8338768: Introduce runtime lookup to check for static builds
  • ... and 27 more: https://git.openjdk.org/jdk/compare/ff59532ddd3002df61e46d58b3f29d26c78295da...master

Your commit was automatically rebased without conflicts.

openjdk[bot] avatar Sep 03 '24 09:09 openjdk[bot]

@dafedafe Pushed as commit 633fad8e53109bef52190494a8b171035229d2ac.

:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk[bot] avatar Sep 03 '24 09:09 openjdk[bot]