for-mac icon indicating copy to clipboard operation
for-mac copied to clipboard

Java application in container killed after upgrading from 4.37.2 to 4.38.0

Open doberkofler opened this issue 1 year ago • 23 comments

Description

A container that worked as expected in 4.37.2 fails after upgrading to 4.38.0 and works again after downgrading to 4.37.2. The container runs ORDS (an Oracle web client in Java) and gets killed during startup. I already tried to uninstall Docker desktop and reinstall but the problems persists. I attached the terminal output in "Additional Info".

Reproduce

The application itself required a running Oracle database instance, so I'm unable to upload a reproducible example.

  1. build: docker build --rm --tag="qualiant/ords" --progress=plain --no-cache --build-arg TIMEZONE=Europe/Vienna temp_ords

  2. run: docker run --detach --name="oracleords" --network="oraclenet" --memory="1g" --memory-swap="2g" --publish="8080:8080" --mount="type=bind,source=<path>=/opt/ords/doc_root/lj_unittest,readonly" --env="TZ=Europe/Vienna" --hostname="ords_host" qualiant/ords

Dockerfile.zip

Expected behavior

The application should run the same way in 4.38.0 as it did in 4.37.2

docker version

Client:
 Version:           27.5.1
 API version:       1.47
 Go version:        go1.22.11
 Git commit:        9f9e405
 Built:             Wed Jan 22 13:37:19 2025
 OS/Arch:           darwin/amd64
 Context:           desktop-linux

Server: Docker Desktop 4.38.0 (181591)
 Engine:
  Version:          27.5.1
  API version:      1.47 (minimum version 1.24)
  Go version:       go1.22.11
  Git commit:       4c9b3b0
  Built:            Wed Jan 22 13:41:17 2025
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.25
  GitCommit:        bcc810d6b9066471b0b6fa75f557a15a1cbf31bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e946
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client:
 Version:    27.5.1
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  ai: Ask Gordon - Docker Agent (Docker Inc.)
    Version:  v0.7.3
    Path:     /Users/doberkofler/.docker/cli-plugins/docker-ai
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.20.1-desktop.2
    Path:     /Users/doberkofler/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.32.4-desktop.1
    Path:     /Users/doberkofler/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.38
    Path:     /Users/doberkofler/.docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Beta) (Docker Inc.)
    Version:  v0.1.4
    Path:     /Users/doberkofler/.docker/cli-plugins/docker-desktop
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /Users/doberkofler/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.27
    Path:     /Users/doberkofler/.docker/cli-plugins/docker-extension
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.5
    Path:     /Users/doberkofler/.docker/cli-plugins/docker-feedback
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v1.4.0
    Path:     /Users/doberkofler/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/doberkofler/.docker/cli-plugins/docker-sbom
  scout: Docker Scout (Docker Inc.)
    Version:  v1.16.1
    Path:     /Users/doberkofler/.docker/cli-plugins/docker-scout

Server:
 Containers: 2
  Running: 2
  Paused: 0
  Stopped: 0
 Images: 2
 Server Version: 27.5.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 CDI spec directories:
  /etc/cdi
  /var/run/cdi
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: bcc810d6b9066471b0b6fa75f557a15a1cbf31bb
 runc version: v1.1.12-0-g51d5e946
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
  cgroupns
 Kernel Version: 6.12.5-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 7.655GiB
 Name: docker-desktop
 ID: ffb78d3a-3f32-4cf3-b78a-446b91027275
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Labels:
  com.docker.desktop.address=unix:///Users/doberkofler/Library/Containers/com.docker.docker/Data/docker-cli.sock
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: daemon is not using the default seccomp profile

Diagnostics ID

78532B66-40F5-4EED-B8F4-D9264E84F790/20250131161153

Additional Info

2025-02-03 09:40:51 + ./install/bin/ords --config ./config serve
2025-02-03 09:40:56 2025-02-03T08:40:56.353Z INFO        HTTP and HTTP/2 cleartext listening on host: 0.0.0.0 port: 8080
2025-02-03 09:40:56 2025-02-03T08:40:56.416Z INFO        Disabling document root because the specified folder does not exist: ./config/global/doc_root
2025-02-03 09:40:55 
2025-02-03 09:40:55 ORDS: Release 23.4 Production on Mon Feb 03 08:40:55 2025
2025-02-03 09:40:55 
2025-02-03 09:40:55 Copyright (c) 2010, 2025, Oracle.
2025-02-03 09:40:55 
2025-02-03 09:40:55 Configuration:
2025-02-03 09:40:55   /opt/ords/config/
2025-02-03 09:40:55 
2025-02-03 09:40:56 2025-02-03T08:40:56.417Z INFO        Default forwarding from / to contextRoot configured.
2025-02-03 09:41:02 2025-02-03T08:41:02.884Z INFO        Configuration properties for: |lj_unittest|lo|
2025-02-03 09:41:02 db.servicename=TEST
2025-02-03 09:41:02 java.specification.version=23
2025-02-03 09:41:02 conf.use.wallet=true
2025-02-03 09:41:02 sun.jnu.encoding=UTF-8
2025-02-03 09:41:02 user.region=US
2025-02-03 09:41:02 standalone.static.context.path=/q/p/lj_unittest
2025-02-03 09:41:02 java.class.path=/opt/ords/install/ords.war
2025-02-03 09:41:02 java.vm.vendor=Amazon.com Inc.
2025-02-03 09:41:02 sun.arch.data.model=64
2025-02-03 09:41:02 standalone.static.path=/opt/ords/doc_root/lj_unittest
2025-02-03 09:41:02 nashorn.args=--no-deprecation-warning
2025-02-03 09:41:02 java.vendor.url=https://aws.amazon.com/corretto/
2025-02-03 09:41:02 resource.templates.enabled=false
2025-02-03 09:41:02 user.timezone=UTC
2025-02-03 09:41:02 db.port=1521
2025-02-03 09:41:02 debug.printDebugToScreen=true
2025-02-03 09:41:02 java.vm.specification.version=23
2025-02-03 09:41:02 os.name=Linux
2025-02-03 09:41:02 sun.java.launcher=SUN_STANDARD
2025-02-03 09:41:02 user.country=US
2025-02-03 09:41:02 sun.boot.library.path=/usr/lib/jvm/java-23-amazon-corretto/lib
2025-02-03 09:41:02 sun.java.command=/opt/ords/install/ords.war --config ./config serve
2025-02-03 09:41:02 jdk.debug=release
2025-02-03 09:41:02 sun.cpu.endian=little
2025-02-03 09:41:02 user.home=/root
2025-02-03 09:41:02 oracle.dbtools.launcher.executable.jar.path=/opt/ords/install/ords.war
2025-02-03 09:41:02 user.language=en
2025-02-03 09:41:02 java.specification.vendor=Oracle Corporation
2025-02-03 09:41:02 misc.defaultPage=las_dlg_startup.go
2025-02-03 09:41:02 java.version.date=2025-01-21
2025-02-03 09:41:02 database.api.enabled=true
2025-02-03 09:41:02 java.home=/usr/lib/jvm/java-23-amazon-corretto
2025-02-03 09:41:02 db.username=LJ_UNITTEST
2025-02-03 09:41:02 owa.docTable=LJP_Documents
2025-02-03 09:41:02 file.separator=/
2025-02-03 09:41:02 java.vm.compressedOopsMode=32-bit
2025-02-03 09:41:02 line.separator=
2025-02-03 09:41:02 
2025-02-03 09:41:02 restEnabledSql.active=false
2025-02-03 09:41:02 java.specification.name=Java Platform API Specification
2025-02-03 09:41:02 java.vm.specification.vendor=Oracle Corporation
2025-02-03 09:41:02 feature.sdw=false
2025-02-03 09:41:02 java.awt.headless=true
2025-02-03 09:41:02 standalone.context.path=/ords
2025-02-03 09:41:02 db.hostname=oracledb
2025-02-03 09:41:02 db.password=******
2025-02-03 09:41:02 sun.management.compiler=HotSpot 64-Bit Tiered Compilers
2025-02-03 09:41:02 security.requestValidationFunction=LAS_SYS_Authorize.authorize
2025-02-03 09:41:02 java.runtime.version=23.0.2+7-FR
2025-02-03 09:41:02 user.name=root
2025-02-03 09:41:02 stdout.encoding=UTF-8
2025-02-03 09:41:02 path.separator=:
2025-02-03 09:41:02 standalone.http.port=8080
2025-02-03 09:41:02 os.version=6.12.5-linuxkit
2025-02-03 09:41:02 java.runtime.name=OpenJDK Runtime Environment
2025-02-03 09:41:02 file.encoding=UTF8
2025-02-03 09:41:02 plsql.gateway.mode=direct
2025-02-03 09:41:02 security.externalSessionTrustedOrigins=*
2025-02-03 09:41:02 java.vm.name=OpenJDK 64-Bit Server VM
2025-02-03 09:41:02 java.vendor.version=Corretto-23.0.2.7.1
2025-02-03 09:41:02 java.vendor.url.bug=https://github.com/corretto/corretto-23/issues/
2025-02-03 09:41:02 java.io.tmpdir=/tmp
2025-02-03 09:41:02 oracle.dbtools.cmdline.ShellCommand=ords
2025-02-03 09:41:02 java.version=23.0.2
2025-02-03 09:41:02 user.dir=/opt/ords
2025-02-03 09:41:02 os.arch=amd64
2025-02-03 09:41:02 java.vm.specification.name=Java Virtual Machine Specification
2025-02-03 09:41:02 jdbc.MaxLimit=100
2025-02-03 09:41:02 oracle.dbtools.cmdline.home=/opt/ords/install
2025-02-03 09:41:02 native.encoding=UTF-8
2025-02-03 09:41:02 java.library.path=/usr/lib/jvm/java-23-amazon-corretto/lib/server:/usr/lib/jvm/java-23-amazon-corretto/lib:/usr/lib/jvm/java-23-amazon-corretto/../lib:/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
2025-02-03 09:41:02 java.vendor=Amazon.com Inc.
2025-02-03 09:41:02 java.vm.info=mixed mode, sharing
2025-02-03 09:41:02 stderr.encoding=UTF-8
2025-02-03 09:41:02 java.vm.version=23.0.2+7-FR
2025-02-03 09:41:02 sun.io.unicode.encoding=UnicodeLittle
2025-02-03 09:41:02 jdbc.InitialLimit=10
2025-02-03 09:41:02 db.connectionType=basic
2025-02-03 09:41:02 java.class.version=67.0
2025-02-03 09:41:02 standalone.access.log=config/logs
2025-02-03 09:41:02 
2025-02-03 09:41:22 ./install/bin/ords: line 222:    39 Killed                  ${JAVA} "${APP_VM_OPTS[@]}" ${ORDS_DEBUG} -jar "${ORDS_HOME}"/${ORDS_WAR} ${ORDS_VERBOSE} "$@"

doberkofler avatar Jan 31 '25 16:01 doberkofler

I'm also having issues with some java based containers. it seems limited to java 17 in my case, similar application on java 21 does not have the issues.

this is one of the errors i'm getting 2025-01-31 15:12:55,705 ERROR [41 lity] FixedSizeBlockingPool [] Pool object could not be added due to exception: java.lang.NullPointerException: Cannot invoke "jdk.internal.platform.CgroupInfo.getMountPoint()" because "anyController" is null [ ] Exception in thread "Native-Process-Pool-1-17" java.lang.NullPointerException: Cannot invoke "jdk.internal.platform.CgroupInfo.getMountPoint()" because "anyController" is null at java.base/jdk.internal.platform.cgroupv2.CgroupV2Subsystem.getInstance(CgroupV2Subsystem.java:80) at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:114) at java.base/jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:177) at java.base/jdk.internal.platform.SystemMetrics.instance(SystemMetrics.java:29) at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:58) at java.base/jdk.internal.platform.Container.metrics(Container.java:43) at jdk.management/com.sun.management.internal.OperatingSystemImpl.<init>(OperatingSystemImpl.java:182) at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(PlatformMBeanProviderImpl.java:280)

JasringStw avatar Feb 03 '25 07:02 JasringStw

@JasringStw I tried several different jdk and linux versions but at least for me the error persists.

doberkofler avatar Feb 03 '25 09:02 doberkofler

I just discovered that the problem seems to be caused by the --memory="1g" option. When removing the memory options the app also works in 4.38.0 but also raising it to 2g still causes the app to quit.

doberkofler avatar Feb 03 '25 09:02 doberkofler

4.38.0 uses Kernel 6.12 and according to the linked JDK bug some cgroups controllers have been moved and the cgroups memory controller is not detected anymore by JDK which causes the JDK to calculate its memory based on the total memory available instead of the memory defined by docker / cgroups.

https://bugs.openjdk.org/browse/JDK-8348566

As a workaround you have to use fixed memory using -Xmx2G or similar. Sadly the JDK bug only has priority P4.

Would be great if Docker for Mac could revert back to a kernel prior 6.12 until JDK is fixed. The bug did break our development setup because all Java based images have been restarted all the time (OOMkill because Java used more memory than allowed via cgroups).

jnehlmeier avatar Feb 03 '25 09:02 jnehlmeier

I also have a problem with the new version of Docker 4.38.0 when I try to run Elasticsearch. Easiest way to reproduce is to run

docker run docker.elastic.co/elasticsearch/elasticsearch:7.16.1

It will fail on error

Could not reconfigure JMX java.lang.NullPointerException: Cannot invoke "jdk.internal.platform.CgroupInfo.getMountPoint()" because "anyController" is null

TomasLudvik avatar Feb 03 '25 10:02 TomasLudvik

just build one of my images with a newer java subversion (17.0.14 up from 17.0.4) and that seems to solve the issues.

JasringStw avatar Feb 03 '25 15:02 JasringStw

My container was based on jdk 23 (amazoncorretto:23-alpine)

doberkofler avatar Feb 03 '25 16:02 doberkofler

There are two different issues mentioned here both caused by Kernal 6.12 changes.

  1. If your code uses JMX and OperatingSystemMXBean you might have a NullPointerException because Java calls its internal cgroups API and doesn't expect something to be null. This might have been fixed in newer Java versions as stated by @JasringStw
  2. If you use cgroup memory limits while deploying the docker image, then Java ignores this limit and your process might get killed by Linux because Java starts to use more memory than allowed via cgroups. This issue might exist in JDK 17 but definitely exists in JDK21+ (I tried 21 and 23).

jnehlmeier avatar Feb 03 '25 16:02 jnehlmeier

Here is the main issue for OpenJDK: https://bugs.openjdk.org/browse/JDK-8347129

You can also see the current backports.

jnehlmeier avatar Feb 04 '25 09:02 jnehlmeier

4.38.0 linux kernel version is 6.12.5. https://stackoverflow.com/questions/71532170/java-lang-nullpointerexception-cannot-invoke-jdk-internal-platform-cgroupinfo/71585718#71585718

zoran-liu avatar Feb 06 '25 10:02 zoran-liu

This is likely to be a serious issue for pretty much anyone using java in docker, and is probably cause for docker to downgrade the kernel version to the same that was in 4.37.2 until the fix reaches java and applications have updated their images to use the new versions.

Here is the main issue for OpenJDK: https://bugs.openjdk.org/browse/JDK-8347129

You can also see the current backports.

Comparing the most recent java 11 build across docker versions with docker > java -Xlog:os+container=trace -version

On Docker 4.38.0:

[0.003s][trace][os,container] OSContainer::init: Initializing Container Support
[0.003s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
[0.003s][debug][os,container] controller cpu is not enabled
[0.003s][debug][os,container] controller memory is not enabled
[0.003s][debug][os,container] One or more required controllers disabled at kernel level.
java version "11.0.26" 2025-01-21 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.26+7-LTS-187)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.26+7-LTS-187, mixed mode)`

On Docker 4.37.2:

[0.003s][trace][os,container] OSContainer::init: Initializing Container Support
[0.003s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
[0.003s][debug][os,container] Detected cgroups v2 unified hierarchy
[0.004s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/d4d52354-948f-4cfa-bf7c-37fab0dce2e0/cpu.max
[0.004s][trace][os,container] Raw value for CPU quota is: max
....
java version "11.0.26" 2025-01-21 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.26+7-LTS-187)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.26+7-LTS-187, mixed mode)

mccaig avatar Feb 11 '25 01:02 mccaig

Thanks for the report and repro instructions. I've created a developer build which has the older kernel:

  • Intel: https://desktop-stage.docker.com/mac/main/amd64/182688/Docker.dmg
  • Apple Silicon: https://desktop-stage.docker.com/mac/main/arm64/182688/Docker.dmg

Let me know if these builds help (or not!)

djs55 avatar Feb 12 '25 14:02 djs55

I confirm this version is working correctly for me/us. The JVM auto-configuration is applied and the heap is set based on request/limits in k3d or based on container when run in docker run ….

davinkevin avatar Feb 12 '25 16:02 davinkevin

would be great to have a workaround (besides downgrading) as this is crippling work for many java developers

michbsd avatar Feb 12 '25 17:02 michbsd

Thanks for the report and repro instructions. I've created a developer build which has the older kernel:

  • Intel: https://desktop-stage.docker.com/mac/main/amd64/182688/Docker.dmg
  • Apple Silicon: https://desktop-stage.docker.com/mac/main/arm64/182688/Docker.dmg

Let me know if these builds help (or not!)

@djs55 Yes, this build seems to revolved the problem. Thank you!

doberkofler avatar Feb 12 '25 23:02 doberkofler

@doberkofler @davinkevin @michbsd thanks very much for the quick tests and confirmation that the kernel downgrade works.

@michbsd does the development build unblock your work for the moment? It should be safe to use.

djs55 avatar Feb 13 '25 10:02 djs55

yes, the development build is working for me @djs55 - thank you

michbsd avatar Feb 13 '25 10:02 michbsd

Development build fixes the problem I was seeing too (another NPE in jdk.internal.platform.CgroupInfo.getMountPoint()).

iay avatar Feb 14 '25 17:02 iay

@djs55 thanks, the development build resolves the issue:

$ java -Xlog:os+container=trace -version
[0.001s][trace][os,container] OSContainer::init: Initializing Container Support
[0.001s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
[0.001s][debug][os,container] Detected cgroups v2 unified hierarchy
[0.001s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup/cpu.max
[0.001s][trace][os,container] Raw value for CPU quota is: 100000
java version "11.0.26" 2025-01-21 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.26+7-LTS-187)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.26+7-LTS-187, mixed mode)

mccaig avatar Feb 18 '25 19:02 mccaig

@djs55 thank you for providing the dev build - was the kernel downgrade a one off to help people work around, or is it going to make its way into the next Docker Desktop release?

davetroiano avatar Feb 20 '25 14:02 davetroiano

@davetroiano the downgrade has been merged for the next release, and we've proposed a release note:

- Downgraded Linux kernel to v6.10.14 to fix an OpenJDK bug that caused Java containers to terminate due to cgroups controller misidentification. See https://github.com/docker/for-mac/issues/7573.

djs55 avatar Feb 21 '25 11:02 djs55

Thanks for the report and repro instructions. I've created a developer build which has the older kernel:

* Intel: https://desktop-stage.docker.com/mac/main/amd64/182688/Docker.dmg

* Apple Silicon: https://desktop-stage.docker.com/mac/main/arm64/182688/Docker.dmg

Let me know if these builds help (or not!)

this also worked for me

hivenet-maximeweyl avatar Mar 01 '25 23:03 hivenet-maximeweyl

The new docker version 4.39.0 (184744) fixes this issue. Just tested and it works without problem.

tmoreira2020 avatar Mar 05 '25 19:03 tmoreira2020

seems like it is fixed already. close it :)

MalinrRuwan avatar Mar 30 '25 17:03 MalinrRuwan